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(57) A methcxJ of image decoding of MPEis type sig- 
nals with the predicated frame (P frame) macroblocks 
decoded at either full resolution or reduced resolution 



depending upori assessment of a macrobloclc. High en- 
orgy or edge content macroblocks may be decoded at 
full resolution. 
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Description 

BACKGROUND OF THE INVENTION 

s [0001] The invention relates to electronic image methods and devices, and, more particularly, to digital communica- 
tion and storage systems with compressed images. 

[0002] Video communication (television, teleconferencing, Intemet, and so forth) typically transmits a stream of video 
frames (pictures, images) along with audio over a transmission channel for real time viewing and listening or storage. 
However, transmission channels frequently add cprmpting noise and have limited bandwidth. Consequently, digital 
10 video transmission with compression enjoys widespread use. In particular, high definition television (HDTV) will use 
MPEG-2 type compression, 

[0003] The MPEG bitstream for a 1 920 by 1 080 HDTV signal will contain audio plus video I frames, P frames, and 
B frames. Each t frame includes about 6000 macroblocKs with each macrobbck made of four 6x6 DCT (discrete cosine 
transfomi) luminance blocks and two 8x8 DCT chrominance (red and blue) blocks, although these chrominance blocks 

IS may be extended to 16x8 or even 16x16 in higher resolution. Each P frame has up to about 8000 motion' vectors with 
half pixel resolution plus associated residual macroblocks with each macroblock in the form of four 8x8 DCT residual 
luminance blocks plus two 8x8 DCT chrominance residual blocks. Each B frame has up to about 8000 (pairs of) motion 
vectors plus associated reskdual macroblocks with each macroblock in the form of four 8x8 DCT luminance residual 
bkxks plus two 6x8 DCT chrominance residual blocks. 

20 The Federal Communications Commission (FCC) has announced plans for rolling out HDTV standards for the broad- 
casting industry which will use MPEG-2 coding. In order to maintain backward compatability with the millions of standard 
definition television (SDTV). an HDTV to SDTV transcoder has been pursued by several investigators. For example. 
USP 5,262,854 and USP 5.635,985 show conversion of HDTV type signals to low resolution. Transcoders essentially 
downsample by a factor of 4 (factor of 2 in each dimension) so the 1920 pixel by 1080 pixel HDTV frame becomes a 

2S 960 by 540 frame which approximates the 760 by 576 of standard TV These published approaches include (1 ) decoding 
the HDTV signals from frequency domain to spatjal domain and then downsampling in the spatial domain and (2) 
downsampling residuals in the frequency domain, scaling the motion vector, and then do motion compensation either 
in the downsampled domain or in the original HDTV domain. However, these transcoders have problems including 
computational complexity. 

30 [0004] Digital TV systems typically have components for tuning/demodulation, forward error correctk>n, depacketing, 
variable length decoding, decompression, image memory, and displayA/CR. The decompression expected for HDTV 
essentially decodes an MPEG-2 type bitstream and may include other features such as downconversion for standard 
TV resolution or VHS recording. 

[0005] A broadcast digital HDTV signal will be in the fomi a MPEG-2 compressed video and audio with error correction 
3S coding (e.g., Reed-Solomon) plus run length and variable length coding and In the form of modulation of a carrier in 
the TV channels. A set-top box front end could include a tuner, a phase-locked loop synthesizer, a quadrature demod- 
ulator, an analog-to-digital converter, a variable length decoder, and forward error correction. The MPEG-2 decoder 
includes inverse DCT and motion compensation plus downsampling If SDTV or other lower resolution is required. USP 
5.635.985 illustrates decoders which include downsampling of HDTV to SDTV including a preparser which discards 
40 DCT coefficients to simplify the bitstream prior to decoding. 

SUMMARY OF THE INVENTION 

[0006] The present invention provides a downsampling for MPEG type bitstreams in the frequency domain and adap- 
45 tive resolution motion compensation using analysis of macroblocks to selectively use higher resolution motion com- 
pensation to deter motion vector drift. 

[0007] The present invention also provides video systems with the adaptive higher resolution decoding. 
[0008] A preferred embodiment set-top box for HDTV to SDTV includes the demodulation (tuner. PLL synthesis, IQ 
demodulation. ADC, VLD, FEC) and MPEG-2 decoding of an incoming high resolution signal with the MPEG-2 decoding 
so including the DCT domain downsampling. 

BRIEF DESCRIPTION OF THE DRAWINGS . 

[0009] The drawings are schematic for clarity 

ss 

Figure 1 depicts a high level functional block diagranri of a circuit that forms a portion of the audio-visual system 
of the present invention; 

Figure 2 depicts a portion of Figure 1 and data flow between these portions; 
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Figure 3 shows the input timing; 

Figure 4 shows the timbig of the VARIS output; 

Figure 5 shows the timing of 4:2:2 and 4:4:4 digital video output; 

Figure 6 depicts the data output of PCMOUT artemates between the two channels, as designated by LRCLX; 
5 Figure 7 shows an example circuit where maximum clock jitter will not exceed 200 ps RMS; 

Figure 8 (read) and Figure 9 (write) show Extension Bus read and write timing, both with two programmable wait 
states; * 

Figure 10 shows the timing diagram of a read with EXTWAIT signal on; 

Figure 11 depicts the connection between the circuitry, an externa! packetizer, Link layer, and Physical layer de- 
70 vices; 

Figure 12 shows a functional block diagram of the data flow between the TPP. DES. and 1394 interface; 

Figure 1 3 and Figure 1 4 depict the read and write timing relationships on the 1 394 interface; 

Figure 15 shows the data path of ARM processor core; 

Figure 16 depicts the data flow managed by the Traffic Controller; - ■ 
IS Figure 17 is an example circuit for the external VCXO; 

Figure 18 shows the block diagram of the OSD module; ' . 

Figure 19 shows example displays of these two output channels; 

Figure 20 show an example of the IR input bitstream*; 

Figure 21 shows a mode! of the hardware interface; 
20 Figure 22 is a block diagram showing a transcoder and an SDTV decoder according to the present invention 

connected to a standard definition television set; 

Figure 23A and 23B is a flow charting illustrating a transcoding process and a decoding process according to the 
present invention; 

Figure 24 is an illustration of the display format of a standard definition television; 
25, Figure 25 is a flow diagrarri which illustrates the operation of the transcoder and decoder of Figure 22; 

Figure 26 is flow diagram which illustrates the flow of Figure 25 in more detail; 

Figures 27a-b illustrate the effect of transcoding accordirig to the present invention; 

Figure 28 is a block diagram illustrating the transcoder and decoder of Figure 22 in more detail; 

Figure 29 is a block diagram of the transcoder of Figure 22. 
30 Figures 30a-c are a flow diagram for adaptive resolution decoding. . : . • ^ 

■ Figure 31 illustrates an adaptive resolution decoder. ' ' ; 

Figures 32a-d show differing architectures. * ' . . 

Figure 33 indicates reference blocks in motion compensation. 

35 [0010] Corresponding numerals and symbols in the different figures and tables refer to corresponding parts unless 
otherwise indicated. 

DESCRIPTION OF THE PREFERRED EIV^BODIMENTS 
40 Overview 

[0011] The simplest, but most computational and storage demanding, method for downsampling an HDTV MPEG 
signal to a resolution. comparable to standard TV would be to decode and store the high definition signal at full resolution 
and downsample to a reduced resolution in the spatial domain for display/output. That is, perform inverse DCT on all 

45 . the blocks of an I frame to have a full resolution I frame which is stored for subsequent motton compensation plus 
downsampled for output, perfomn motion compensation for a P frame using the stored full resolution preceding I (or 
P) frame plus inverse DCT for the residuals to have a full resolution P frame which is stored for subsequent motion 
compensation plus downsampled for output, and perform motion compensation for* a B frame using the stored full 
resolution I and/or P frarties and inverse DCT residual to have the high definition B frame which is downsampled for 

so output. 

[0012] The preferred embodiments limit the computation and/or storage of such high definition MPEG decoding by 
one or more of the features of downsampling in the DCT domain prior to inverse DCT. adaptive resolution motion 
compensation with full resolution decoding only for selected macroblocks. and upsampling of stored reduced resolution 
macrobtocks for motion compensation. In particular, the preferred embodiments include: 

55 

(1) Full resolution I frames, adaptive resolution P frames, and reduced resolution B frames. 

(2) Adaptive resolution I and P frames and reduced resolution B frames. 

(3) Reduced resolution I frames, adaptive resolution P frames, and reduced resolution B frames. 
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The preferred ernbodiments rriay extract a 960 by 540 (SDTV) signal from a 1920 by 1080 HDTV bitstream, and the 
960 by 540 may be further subsampted and extended to desired sizes such as 760 by 576. 

[001 3] Figures SOa-c illustrate the P frame macroblock decoding within a preferred ennbodiment decoder which per- 
forms downsampling in the OCT domain for all macroblocks and then selects the macroblocks to fix. with full resolution 

s while still processing all macroblocks with reduced resolution; that is, the lefthand and righthand vertical paths in Figures 
30a-b are in parallel. Then prior to display/output compose the final output from the two paths. Such a transcoder will 
always work regardless of the type of input sequences. An alternative is to not process macroblocks at reduced res- 
olution which are to be fixed; that is. a macroblock traverses either the lefthand or righthand vertical path but not both. 
This eliminates duplicative computation but demands accurate prediction/scheduling of the computation requirements 

to due to the larger computation to.fix macroblocks. 

[0014] Figure 31 shows a system incoporating the adaptive resolution decoding. 

Figures 32a-d illustrate alternative transcoder architectures. In particular. Figure 32a has an initial parser which extracts 
the MPEG video from the audio and similar functions, separate B-f rame and l/P frame processors which reflects the 
full resolution decoding possibility for the j/P frame macroblocks prior to downsampling, and an MPEG encoder if the 
IS transcoder is to be used with an existing MPEG decoder as illustrated in Figure 32b. The post processor performs 
further processing on spatial domain video, such as resizing, anti-flicker filtering, square pixel conversion, progressive- 
interiace conversion, et cetera. Figure 32c is use of the downsampled output directly, and Figure 32d shows a hybrid 
use of an existing MPEG decoder only for B frames. 

20 Adaptive resolution P frame preferred embodiment 

[0015] The adaptive resolution P frame macroblock preferred embodiments decode I frame macroblocks at full res- 
olution (e.g., HDTV 1920 by 1080), B frames macroblocks at reduced resolution (e.g., 960 by 540). and P frames with 
a mixture of some macroblocks at full resolution and some at reduced resolution. The decision of whether to decode 

25 a P frame macroblock at full or reduced resolution can be made using various measures and can adapt to the situatfon. 
For example, decide to decode an input P frame motion vector plus associated macroblock (four 8x8 DCT luminance 
residual blocks (and optionally the two 8x8 DCT chrominance residual btocks)) at full resolution when the sum of the 
magnitudes of the (luminance) residual DCT high frequency coefficients exceeds a threshold. Alternatively, select a 
macrobtock for full resolution decoding if Its motion vector (MV) points to a stored (mostly) full resolution decoded P 

30 frame macroblock or a stored I frame macroblock with high energy or edge content. For such macroblocks the motion 
compensation at reduced resolution may generate motino vector drift. 

Figures 30a-c show the flow for P-frame macroblocks. In more detail, decode as follows (with Y indicating luminance, 
Cb and Cr indicating chrominance. MV indicating motion vector, and A indicating residual): 

35 (a) l-frame macroblocks: 

1 . Apply inverse DCT to the four 8x8 Y DCT (and optionally to the 8x8 Cb DCT and 8x8 Cr DCT) to get 16x16 
Y (and 8x8 Cb and 8x8 Cr). The chrominance alternate includes downsample Cb and Cr DCTs by taking the 
low frequency 4x4 and then inverse DCT to obtain 4x4 Cb and Cr. . 
40 . 2. Store 1 6x1 6 Y (and 8x8. Cb and 8x8 Cr) for use as references on subsequent P frame and B frames. 

3: 4-point downsample (or other spatial downsample; see discussion below) to 8x8 Y and 4x4 Cb and 4x4 Cr 
for reduced resolution display/output, and optionally repack in groups of four (i.e., four 8x8 Y and one 8x8 Cb 
and one BxB Cr) to form a display/output (reduced resolution) macroblock. 

45 (b) P frame macroblocks: categorize as either: (1) to-be-fixed (full resolutton decode) and (2) not fixed (reduced 

resolution decode) 

(1) For a to-be-fixed macroblock 

so [0016] 

1. Use MV and a reference 16x16 Y (optionally 8x8 Cb.Cr) stored macroblock generated from full resolution 16x16 
Y (and 8x8 Cb.Cr) of stored previous I or fixed P macroblocks and/or 16x16 Y 8x8 Cb.Cr upsampled from stored 
8x8 Y. 4x4 Cb.Cr of stored previous not-fixed P nriacrobkx:ks; see Figure 33 and related discussion about references 
55 below. The upsampling may be any interpolatton method, which may use boundary pixels of abutting stored blocks. 

- 2. Apply inverse DCT to four 8x8 AY DCT (optionally 8x8 A Cb, ACr DCT) to get four 8x8 AY (8x8 ACb, ACr). 
3. Add the full resolution reference macroblock from step 1 and full resolution reskJual macroblock from step 2 to 
reconstruct full resolutk)n four 8x8 Y (8x8 Cb.Cr).. ... 



4 




EP 0 964 §83' A2 



4. Store the reconstructed 16x16 Y (and 8x8 Cb.Cr) fbr fe^Srenceosfe bn^hextP^f?^e^ah^ B'frames (and convert 
to an Intra coded macroblock). - t r... . 

5. 4-point average downsample (or other downsample) to 8x8 Y and 4x4 Cb.Cr for display/output and optionally 
repack in groups of four for a display/output reduced resolution rnacr6bkx:k. ■ ' 

s " *"'^ • * ** ;* 

(2) For a the not-fixed macroblock ' ' '\ 

[0017] 

10 1 . Use MV/2 and generate a 8x8 Y. 4x4 Cb.Cr reference from stored 8x8 Y,'4x4 Cb.Cr of previous not-fixed P and/ 

or 8x8 Y, 4x4 Cb.Cr downsampled from stored full resolutioh (16x16 Y and possibly 8x8 Cb.Cr) I and fixed P 
macroblocks. Because MV has pixel resolution, has % pixel resolution, so the 8x8 Y. 4x4 Cb.Cr reference 
may be generated by 3 to 1 weightings. ' - • • * 

2. Downsample four 8x8 AY OCT. 8x8 ACb. ACr DCT to get 8x8 AY DCT 4x4rACb ACr DCT. 

IB a Apply inverse DCT to 8x8 AY DCT. 4x4 ACb, ACr DCT to get 8x8 AY, 4x4 ACb; ACr 

4, Add the reference from step 1 and the residual from step 3 to recbnstruct:8x8 Y. 4x4 Cb.Cr 

5. Store 8x8 Y and 4x4 Cb.Cr for reference on next P frame and B frames aiid display/output or optinally repack 
in a group of four to output a reduced resolution four 8x8 Y, 8x8 Cb.Cr. 

20 (c) B frame macroblocks 

[00181 . 

1 . Use MV/2 for both motion vectors and generate a 8x8 Y, 4x4 Cb.Cr reference from stored 8x8 Y 4x4 Cb.Cr of 
25 previous not-fixed P and/or 8x8 Y. 4x4 Cb.Cr downsampled from stored full resolution (four 8x8 Y, 8x8 Cb.Cr) I 

and fixed P macroblocks. Because MV has Vi pixel resolution. MV/2 has ^ pixel resolution, so the 8x8 Y, 4x4 Cb, 
Cr reference may be generated by 3 to 1 weightings. 

2. Downsample four 8x8 AY DCT, 8x8 ACb, ACr DCT to get 8x8 AY DCT, 4x4 ACb, ACr DCT. 

3. Apply inverse DCT to 8x8 AY DCT. 4x4 ACb, ACr DCT to get 8x8 AY. 4x4 ACb, ACr 

30 4. Add the reference from step 1 and the residual from step 3 to reconstruct 8x8 Y, 4x4 Cb.Cr anb optionally repack 

in a group of four to display/output a reduced resolution four 8x8 Y.'8x8 Cb.Cr. 

The motion vector derives from the luminance part of the rhacroblocks. so whether the chrominance is decoded at full 
resolution or reduced resolution will not affect motion vector drift. Thus the full resolution decoding of I frame macrob- 
35 locks and to-be-fixed P frame macroblocks may only involve the luminance blocks. The chrominance blocks can all 
be downsampled in the DCT domain by taking the 4x4 low frequency subblock and applying a 4x4 inverse OCT, and 
use the motfon vector divided by 2, 

The alternatives for an HDTV P frame thus include downsample the 32.400 8x8 DCT residual luminance blocks into 
8050 8x8 bCT residual luminance blocks directly in the DCT domain as described below (and anatogously for the. 
40 chrominance blocks), and then categorize these blocks as either (1) to be fixed or (2) no fix is needed. Alternatively, 
assess the need for fixing prior to downsampling to eliminate unnecessary downsampling in Ihe DCT domain. Further, 
the categorization criteria can adapt to available computational power. 

[001 9] The preferred embodiment downsampling rhay be performed in various systems, such as a set top box on a 
standard definition TV so as to enable reception of HDTV signals and conversion to standard TV signals. 

45 

Downsarnpling in the DCT domain 

[0020] Preferred embodiment downsampling is done in the DCT domain. The input data stream to a HDTV decoder 
is in MPEG-2 format. Pixel data are coded as DCT coefficients of 8x8 blocks. A prior art downsampling scheme would 

so be to perform inverse DCT operation on the data to recover them back to coefficients in the spatial domain and then 
pertomi downsampling in the spatial domain to reduce resolution and size. Because the full resolution original picture 
needs to be stored in the spatial domain, the operation has large memory storage requirements. In addition, the two- 
step operation also results in large computational requirfsments.. The preferred embodiment DCT-donnain downsam- 
pling converts full resolution and size DCT domain input data directly to reduced resolution and size spatial domain 

55 pixel values in one step, thus eliminating the need fbr storing the full resolution picture (especially B frames) in spatial 
pixel domain and also limiting computational requirements. The downsampling operation can be represented as a 
matrix operation of the type x MXtsA^ where M is the downsampling matrix and X is the input DCT coefficients. M is 
8 by 16 when X is the 16x16 composed of four 8x8 DCT luminance bkx:ks of a macroblock; and so MXM^ is 8x8. 
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40 



45 



Two types of preferred embodiment downsampling rnatrices have shown good results: lowpass filtering in the OCT 
domain and 4-point averaging in the spatial domain. The low pass filtering in the OCT domain has an 8x16 downsam- 
pling matrix M: 

I D[8]'^ 0 I 
M = DtS]"^!! 0] D[I6] I I 

I 0 D(8]'^ I 



where I is the 8x8 identity matrix. 0 the 8x8 zero matrix, D[16] is the 16x16 OCT tr'ansfomn nnatrix. and D[8] is the 8x8 
OCT transfomi matrix. From right to left: the diagonal block D[8ps perform an inverse DCT of the four 8x8 blocks to 
make the 1 6x1 6 in the spatial domain, the D[1 6] performs a 1 6x1 6 DCT on the 1 6x1 6, the I selects out the low frequency 
8x8 of the 16x16. and the D[8]t performs a final Inverse DCT to yield the downsampled 8x8 In the spatial domain. 
[0021] Similarty. averaging in the spatial domain as a downsampling matrix M: 

20 111 00 00 00 GO 00 00 00 ||D[8]'^ 0 | 

M=V4 |... ... II I 

|00 00 00 00 00 00 00 11 ||0D[8]'^ I 

2S 

where again the diagonal D^J'^'s perform an inverse DCT of the four 8x8 blocks to nnake the 1 6x1 6 in the spatial domain 
and the 8x16 matrix of Os and Is performs a 4-point averaging (groups of 2x2 pixels are averaged to form a single 
downsampled pixel). 

[0022] Details of the downsampling by low pass filtering in the DCT domain 
30 Rather than just discard the DCT hi^ frequency coefficients (e.g. . just keep the 4x4 kDw frequency 'coefffcients of each 

8x8 DCT block) to reduce inverse DCT computation and reduce reconstrucded frame resolution, generate a 16x16 

DCT using the four 8x8 DCT luminance blocks of a macroblock and then discards the 16x16 DCT high frequency 

coefficients (e.g., retain the 8x8 low frequency coefficients) to reduce inverse DCT computation and reduce resolution. 

This switch to a macroblock basis yields computational advantage because the 16x16 DCT coefficients of the mac- 
35 roblock can be expressed in terms of the 8x8 block DCT coefficients plus certain symmetries In this computation can 

be taken advantage. And the low pass filitering vwth the larger 16x16 yields better results than just patching together 

four 4x4 low pass filterings 

[0023] More particularly, let P(j.k) be a 16x16 macroblock made up of the four 8x8 blocte: Pqo. Pol Pio- and Pi,: 



POO POl 

p = I ; 

PlO Pll 
The 16x16 DCT coefficients of P. denoted by W{m,n), are given by: 



W(m. n) = (1/B)SSPa. k)cosl*{2j+1 )m/321coslp(2k+1 )n/32] 

where the sums are over 0^)^15 and O^k^lS plus an extra factor of 1/02 when m = 0 or n = 0. W is 16x16 and 
the foregoing two dimensional DCT definition may be interpreted as two matrix multiplications of 16x16 matrices: W = 
D[16rPD[161 where the 16x16 matrix D[16] has elements D[161(Kn) = (1/0 8)cosK2k+1 )n/32] (with an extra factor of 
ss 1/02 when n equals 0) and D[16]T is the transpose of D[16]. Of course, left multiplication by D[161 gives the DCT for 
the column variable and right multiplication by D[16]T gives the DCT for the row variable. D116] is an orthogonal matrix 
(Dl16]D[16r = 0 due to the orthogonality of the cosines of different frequencies. This implies that the inverse DCT is 
given by: P = D[16]WD[16]T 
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[0024] Also. W can be considered as made up of lour 8x8 IjIdfclcsT Wpi\^^^^ 

Woo Wol 
W = I I 
wio wii 

10 Wqo are the low spatial frequency coefficients, and the preferred embodiment downsamples by taking Wqq as the DCT 
coefficients for an 8x8 block resulting from a downsamplmg of the original 16x16 macroblock R That is, Wqo is the 
DCT of the desired reduced resolution downsampled version of P. Indeed, for a HDTV frame of 1080 rows of 1920 
pixels downsampled by 4 yields a 540 rows of 960 pixels which is close to the standard TV frame of 576 rows of 720 
pixels. . 

IS [0025] Wqo can be expressed in terms of the DCTs of the 8x8 blocks Pqo: Pqi. PiO' ^iv these DCTs are in 
the bitstream. Denote these DCTs by P^oo* P'^ov ^''lO' ^"^i P'^,! - Let the 8x8 matrix D[8} have elements D[8](k.n) = 
1/2cos[«(2k+1)n/16] (with an extra factor of 1/u2 when m equals 0). then D[8] is orthogonal and the 8x8 DCT transfor- 
mation is matrix pre and post multiplication by D[8]T and D[8]. respectively: P^)0 = D[8]TPqqD[8]. P'^^ = DPI^P^D 
[8], and the inverse DCTs are: Poo = D(8]P'>'ooDI8]'^. ... P^ = D[8]PiiDI8lT Inserting the inverse DCT expressions for 

20 Pqq, Pqi, P,o. and P^^ into the definition of W and perfomring the 16x16 matrix multiplications as 8x8 submatrix mul- 
tiplications with 16x16 matrix D[16] expressed as the four 8x8 submatrices D[16]oo D[16]|i. yields: 



2S 



30 



woo 



D[16] O0'^D[8]P'^00D[8l'^D[16]oO 



D[16] io'^D[8] P-i0D[8]'^Dtl6] 00 + D [ 16 ] 00*^0 [ 8 ] P-^OlDC 8 ] '^D [ 16 ] 10 
D.[16] lo'^D[8] P-iiD[8]'^D[16] 10 

= (SP-'OO + TP'"io)s'^ + (SP'^oi + TP'"ii)t'^ 

where S = D[1 6]oo''^D[8] and T = D[1 6]i o^D[8] are both 8x8 matrices but together have only a few nohtrivial components. 
Indeed. 
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where 

aO = (1/4)i:cos[n(2n+1)/32] 
al = (1/V8)2i»s[jt(2n+1)/32]cos[n{2n+1)/16] 
a2 = (1/V8)Zcos[Ti(2n+1)/32]cos[jc{2n+1)2/16l 
a3 = (1/V8)5:cos(ji(2n+1)/32]cos[ji{2n+1)3/16J 
bO = (1/V8)Zcos[Tc(2n+1 )/32]cos[7c{2n+1 )4/1 6) 
i?1 = (1 />/8)2cos[jt{2n+1 )/32]cos{TC(2n+1 )5/1 6] 
fe2 = (1 /V8)Scos[it(2n+1 )/32]cos[n(2n+1 )6/1 6] 
M = (1 /V8)2:cos[jc(2n+l )/32]cos[ic(2n+1 )7/1 6] 
a4 = (1/4)Xcos(7c(2n+1 )3/32] 

d1 5 = {1/>/8)lcosi7i(2n+1 )7/32]cos[7i(2n+l )7/1 6)] 

with Ihe sums over 0 < n < 7. In terms of S and T, the computations to find Wqq amount to three, repetitions of: 8x8 
matrix multiplications with S and T plus matrix addition of the products, and three transpositions: Wqo = (SM^ + TNT)T 
with M = SP%Q + TP'^io N = + ■ Many terms are shared among these computations: consider generally 

Z = SX + TY for X, Y, and Z all 8x8 matrices. Then the particular form of S and T Imply for j = 0, 1 . .... 7: 

Z{0,j) = X(0,j) 4. Y(0,j) 

Z(l,j) = aO[X(0, j) -Y(0, j) ] + al[X{l, j)+Y(l, j)] + a2(X(2,j)- 
Y<2,j)] + a3 [X(3, j )+Y(3, j) ] + JbO [X { 4 , j ) -Y ( 4 , j ) ] + 

i>l[X(5, j)+Y(5, j) ] + b2[X(6, j)-Y(6, j)] + Jb3 [X (7 , j ) +Y (7 , j ) ) 
: Z(2, j). •= X(i, j) - Yd. j) 

Z(3,j) = a4[X(0, j)-Y(0, j)] + a5 [X ( 1 , j ) +Y ( 1 , j ) 1 + a6[X(2,j)- 
Y(2,j)l + , a7(X(3, j)+Y(3;, j.)] + b4 [X ( 4 , j ) -Y (4 , j ) 1 + 

jb5[X(5, j)+Y(5, j) ] + b6[X(6, j)-Y(6, j)] + b7 [X ( 7 , j ) +Y ( 7 , j ) ] 

Z(4, j) = X{2, j) + Y{2, j) 

Z(5,j) = a8[X(6. j)-Y(0, j)] + a9 [X ( 1 , j ) +Y ( 1 , j ) ] + 
al0tX(2; j)-Y(2o)] . +. . all [XX 3 , j ) .+Y < 3 . j ) ] + i?8 [X < 4 , j ) -Y (4 , j ) ] + 
Jt>9[X<5, j) +Y(5, j) ] + JblO[X(6, j)-Y(6, j)] + Jbll ( X ( 7 . j ) +Y ( 7 , j ) ) 

Z(6,j) = X(3,j) - Y(3,j) 

Z(7.j) = al2[X{0, j)-Y{p, j)] + al3[X{l,j)+Y(l, j) ] + 
al4[X(2, j)+Y(2, j) ] + al5lX(3, j)+Y(3, j) 1 + bl2 IX ( 4 , j ) -Y (4 , j ) ] + 
. Jbl3[X(5, j)+Y(5, j) ] + bl4[X(6o)+Y(6, j) ] + bl5 [X (7 , j ) +Y (7 , j ) ] 
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There are many terms that are shared among the foregoing equations for the Z{l.j), and precomputation of them can 
save more computation as follows. Define: 

AO = X{0.j) + Y(0.j) 
5 A1 = X(O.j) - Y(0,j) 

B0 = X(lJ)-Y(1.j) 

B1 =X{1.i) + Y(1.j) 

C0 = X(2.j)-Y(2j) 

C1 = X(2.j) + Y(2.j) 
10 DO = X(3.j) - Y(3,j) 

D1 = X(3J) + Y(3.j) 

E = X{4.j).Y(4.j) 

F = X(5.j)-Y(5.j) 

G = X(6.j)-Y{6.j) 
IS H = x(7.j) + Y(7.j) 

Thus the Z(i.j) equations become: 

AO 

aO^Al + al*Bl + a2*Cl + a3*Dl + JbO*E + jbl*F + b2*G 
BO 

a4*Ai + a5*Bl + a6*Cl + a7*Dl + bA*E + b5*F + Jb6*G 

30 .. ^i<^'3) = CO . ; . 

Z{5,j) = a8*Al + a9*Bl + alO *C1 + air*Dl + Jb8*E + Jb9*F + 
JblO*G + bll*H 

Z(6,j) = DO 

Z(7,j) = al2*AH" al3*Bl+ al4*Cl+ al5*Dl+ JbL2*E+ Jbl3*F+ 
i?14*G+ jbl5*H 

40 The total computation needed to obtain Z(kJ) can be estimated from the foregoing equations (32 multiplications and 
40 additions) as 72 operations for each column Z(,.j). To compute Z thus takes 8 * 72 = 576 operations. Thus the 
computation of Wqo will take 3 * 576 = 1 728 operations. 

[0026] Tlierefor, a 16x16 macroblock can be downsampled with 1728 operations. To dowhsamie a lull-size 
1080x1 960 HDTV sequence at 30 frannes/second (assuming all frame macroblocks), implies computing power (number 
45 of instructions for a DSP with one cycle multlplications).of: . 

(1 080/1 6)*(1 920/1 6)*1 728*30 instructkms per second = 425 MIPS. 

so [00271 Store the downsampled 8x8 blocks of the I frame in a buffer These blocks will be used in the motion com- 
pensated reconstruction of the subsequet P and B frames. 

Motion vector drift in P frames 

55 [0028] Decoding P and B frames require both the motion vector predicted macroblocks from stored P and/or I frames 
and the inverse DCT of the residuals. The residual macroblock DCT (four 8x8 DCT luminance residual blocks plus two 
8x8 DCT chrominance residual blocks) can be downsampled in the DOT domain as described in the foregoing. The 
motion vectors may be scaled down (Le., divide both components by 2 and optionally round to the nearest half pixel 



20 Z(0,j) = 

Z(l,j) = 
+ Jb3*H 

25 Z(2, j) = 

Z(3,j) = 
+ b7 *H 
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locations if the scaled motion vector is to be output). However, a P frame following several P frames after an I frame 
may exhibit flickering about highly detailed textures and jaggedness around moving edges. The problem traces back 
to a loss of accruarcy in the motion vector. Consequently, the preferred embodiment assesses the likelihood of motion 
vector drift for a P frame (downsampted) macroblock and selectrively fix macroblocks with a high likelihood by decoding 
s at full resolution prior to downsampling for display/output. (The decoding only performs inverse OCT for the pixels that 
are needed in some embodiments.) For all B frame macroblocks and for P frame macroblocks which are not likely to 
have motion vector drift, the macroblocks of residuals are downsampled in the OCT domain as in tHe foregoing, and 
the motion vectors just divided. by two in the reconstructed downsampled frames. 

[0029] In particular, for a P frarne 1 6x16 macroblock of OCT residuals (four 8x8 OCT luminance blocks of residuals 
^0 in the bitstream) first perform the downsampling in the OCT domain as described in the foregoing to yieki Wqq, the 6x8 
OCT of the downsampled block of residuals. Next, measure the energy of Wqo by the sum of squares of the coefficients 
(22;WQQ(j,k)2) with the sum over the range O^j, k<7 and also measure the fraction of energy which is high spatial fre- 
quency energy of Wqq by the sum of the squares of the coeficients with the sum excluding the subrange 0<j, k<3. If 
the energy is greater than a threshold and the portion of high frequency energy is greater than a second threshold, 
then classify the block as needing to be fixed (full resolution macroblock decoding); otherwise classify the block as not 
to be fixed (available for OCT domain downsampling). All B frame macroblocks are classified as available for OCT 
domain downsampling; B frames only predict from P or I frames, so they do not incur motion vector drift once the P 
frames overcome motran vector drift. 

[0030] Alternative determinations of which P frame macroblocks to fix may be made, and the determination may be 

20 made prior to downsampling, so the full resolution, inverse OCT couki be used and then the reconstructetd macroblock 
stored at full resolution and lastly spatially downsampled for output at reduced resolution. The characteristics of a 
macrobkx:k for fixing: large high frequency components, large motion vector, motion vector points to stored full reso- 
lution fixed macroblock. et cetera. The idea is that if a block has a lot of high frequency cmponents (large OCT coeffi- 
cients at high frqeuenctes), then it needs fixing. Also, if a block is in a high motk>n region (large motion vector) it may 

2S not need fixing (unless the OCT high frequency compoenents are too large) because rapid motkvi is less precisely 
perceived. Also, a P frame macroblock represents residuals, so a P frame macroblock with a high energy or edge 
content I macroblock as its reference may need fixing to maintain accuracy. Further, fixing P frame macroblocks takes 
computational power, so the decision to fix or not may include a consideration of currently available computational 
power; for example, thresholds can be adjusted depending upon load. 

30 [0031] For selective blocks needing to be fixed with full 16x16 macroblock decoding, reconstruct as follows. First, 
use the full motion vector to locate the 16x16 reference macroblock (or 17x17 for half pixel motion vectors) in the 
preceding full resolution I or P frame (the stored I frame has full resolution, but the P frame may be (partially) stored 
in reduced resolution and this will lead to upsampling of the stored reduced resolution portions). The reference mac- 
roblock straddles (at most) nine different 8x8 blocks as illustrated in Figure 33 where the broken-line large square is 

35 the reference 16x16 macroblock and the numbered solid tine blocks are the 8x8 blocks covered by the reference 
macroblock. These nine 8x8 blocks are blocks of at most four 16x16 (2x2 array of spatial 8x8s) macroblocks. If one 
or more of these four macroblocks is stored at full resolution (i.e., an I macroblock of a fixed P macroblock), then simply 
use the pixels of the 8x8 for the corresponding portion of the reference 1 6x1 6. Contrarily, if any of these four macroblocks 
is stored with reduced resolution (e.g., a not fixed P nriaqroblock), then for these macroblocks (which are stored as 8x8 

40 luminance and 4x4 chrorninance) upsample (at least a portion of) the 8x8 luminance block to a 16x16 simply by inter- 
polation (this may use boundary pixels of abutting stored macroblocks and may simply be linear interpolation or a 
context -based interpolation may be used) and use the upsampled pixels for the corresponding portions of the 16x16 
reference. Thus the ref erence macrobkxk will be full resolutbn 1 6x1 6. and the. residual DCT has full resolution inverse 
DCT to add to the ref efence. 

45 [0032] For P macro.bkx:ks that do not need fixing (and all B macrobkx:ks). just downsample the residual DCT in the 
DCT domain as in the foregoing, and divide the motion vector components by 2. Locate the reference block (8x8 at 
reduced resolution) which will lie in a group of at most four 8x8 reduced resolution blocks. If any of these 8x8 reduced 
resolution blocks is stored at full resolution, then use a 4-point or other spatial downsample to make 8x8 reduced 
resolution.. Use the pixels of the reduced resolution 8x8 for the correspond pixels of the 8x8 reference; the 14 pixel 

so motion vector resolution may require 3 to 1 weightings to make the reference 8x8.. 

The chrominance blocks may be treated analogously, except the. full resolution is 8x8 and downsampling just low 
pass filitering to a 4x4 DCT. But motion vectors are derived from luminance only, so full resolution chronlinance is not 
needed to deter motion vector drift. 

Figures 30a-c is a flow diagram for the P macroblocks showing the decision of to be fixed or not fixed. Note that a 
ss lookup table (hash table) keeps track of the fixed macroblocks and can be used to help adapt to currently available 
computation power or memory. 



10 



EP 0 964 583 A2 



Cropped alternative adaptive P frames 
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20 



25 



[0033] An alternative preferred embodiment for handling the'P frame macroblocks to be fixed without upsampling 
stored reduced resolution proceeds as follows. The reference macroblock straddles (at most) nine different 8x8 blocks 
as illustrated in Figure ? where the broken-line large square is the reference macroblock' and the numbered solid line 
blocks are the 8x8 blocks covered by the reference macrobkxk. However, only a portbh (sometimes a small portion) 
of the pixels inside the 8x8 blocks are used in the reference macroblock, in the exUeme case, only one pixel of a block 
is used. Because only the hicji energy macroblocks need full decoding. 'the usual approach' of applying inverse OCT 
to all of the relevant blocks (i.e.. all nine blocks in Figure 2) wastes comjDuting power: thus crop the blocks in the OCT 
(frequency) domain as described in the following paragraphs, and inverse DCT only the cropped portions. This yields 
a full resolution reference macroblock. Then add the inverse OCT of the 16x16 macroblock of DCT residuals. Lastly, 
downsample this full resolution macroblock to yield the 8x8 dowrisample'd block for the reconstruction of the P frame. . 
Also store the full resolution macroblock because a subsequent P frame macroblock rhay need selective decoding and 
will use this full resolution macroblock as the reference macroblock. Of ciourse, the last P frame before the next I frame 
does not need any full resolution storage because B frame macroblocks are all treated as low energy/edge. 
[0034] The operation on each 8x8 block involved in a reference'macroblock is either (1 ) obtain all of the pixels in the 
block or (2) crop the block so that only the pixels needed remain. In riiatrix tenminoldgy. the operation of cropping a 
part of a block can be written as matrix ultiplications. For instance, crpping the last m rows of an 8x8 matrix A can be 
written as Ac = Ci_A where Cl is 8x8 with all components 0 except ClG.I") = 1 tor 8-m < j < 7. Similarly, postmultipllcatlon 
by Cr crops the last n columns if Cr has all 0 components except CrQ.]) = I'for 8-n ^ j < 7. Thus the operation of 
cropping the lower right m rows by n columns submatrix of A can be written as A^ = ClACr. Then denoting the DCT 
of A by A'^ implies A = D[8]'rAt)[8] where D[8J again is the 8x8 DCT transformation matrix. Thus A,, = CLD[8rA^[8] 
Cr and again name the products as U = ClD[8]T and V = CrD[8)T so that = UA'^TT Note that the first 8-m rows of 
U are all zeros and the first 8-n columns of T are all zeros. Thus denoting the mx8 matrix of the m nonzero rows of U 
as Uc and the 8xn matrix of the n nonzero columns of V as Vq. the mxh matrix A^ropped consisting of the cropped 
portion of A is given by A^ropped = UqA'^Vc*''. Actually, is the 1st m rows of the inverse 8x8 DCT matrix, and Vq is 
the last rows of the inverse 8x8 DCT matrix. The inverse 8x8 DCT matrix is given by: 
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[0035] The number of operations needed to compute B = UqA'^ is'n*1 3*8 = 104m, where B is an mx8 matrix. Com- 
puting Acropped = BVc"*" needs m*13*n = 13nm operations. The total for one block is 104m + 13mn = (13n + 104)m, Of 
course, computing A^ropped''^ essentially also computes Aeropped and by symmetry this takes (1 3m + 104)n operations. 
Thus, A^rop^dCan be computed with [13*max{m.n) + i64]*min(m.n) operations. 

[0036] Note that a full 8x8 inverse DCT (with no fast algorithms) needs 1 3*8+1 04)*8= 1664 operations. However, if 
only one pixel Is used from the 8x8 block, then the foregoing shows that the cropped ajpproach computatksn only needs 
(13*1 + 104)*1 = 117 operaticms; a savings of 93%. 

[0037] Estimate the computatfonal complexity of the selective macroblock decoding by using the foregoing estimates 
of a single cropped block as follows. Consider Figure 2. for a 16x16 macroblock the largest covered area (broken-line 
square) is 17x17 (due to half pixel resolution of the motion vector). Therefore, a + b ^ 9 and c + d ^ 9. Thus the 
computational load for each of the 9 blocks is as follows (presuming without loss of gnerality that a ^ b, c < d. and b s d) : 



ss block 1: (13a+104)c 

block 2: (13*8+1 04)a 
block 3; (13d+104)a 
block 4: (1 3*8+1 04)c 
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5 



block 5: 
block 6: 
block?: 
bkxkS: 
block 9: 



(1 3*8+1 04)d 
(13b+104)c 
(1 3*8+1 04)b 
(13d+104)b 



1664 



Therefore the total computation for obtaining all of the pixels needed for the 16x16 motion connpensation part of re- 
construction is the sum of computations for blocks 1-9 which is 1664 + (1 3*8+1 04)(a+b+c+d) + 13(a+b)(c+d) + 104 
(a+b+2c) and this is at most 8257 operations. The total operations for bilinear interpolation is 64 operations. The cost 
10 of forward 8x8 DCT is 64*1 1 *2 = 1 408 operations. The total operations count for obt raining the reference macroblock, 
fiftering/downsampling, and forward DCT is at most 9729 operations. 

[0038] For 1 920x1 080 HDTV sequence at 30 frames/second, the worst case scenario is that no B frames are present. 
The total computational load is 



With 400 MIPS available the selective full decoding for about 17% of the macrobk)cks. If the HDTV sequence Is in the 
format of IBBF (one P frame for every 3 frames), then 400 MIPS could handle about 50% of the P frame macrobkx;ks. 

20 [0039] Adaptive resolution I frame macroblock preferred embodiments 

[0040] The I macroblocks may also be categorized into full resolution and reduced resolution decoding analogous 
to the P macroblocks. In particular, small high frequency components in the I macroblock luminance DCTs permits 
reduced resolution decoding by downsampling in the DCT domain as previously descnbed. Thus, as with P macrob- 
locks, I macraoblocks may be stored either as full resolution or reduced resolution, and when a reduced resolution 

2S macroblock is used as a part of a full resolution reference, it is upsampled. 

[0041] Other methods for deciding whether to decode in full resolution include current computational load and wheth- 
er the prior P macroblock in the same location was fixed or not. 
[0042] Reduced resolution I macroblocks with adaptive resolution P macroblocks 

[0043] The I macroblocks may be all downsampled in the DCT domain and stored as reduced resolution. When a P 
30 macroblock is to be fixed and the reference is in an I frame, then upsample the stored reduced resolution I macroblocks 
as previously described. 

B and P frames 

35 [0044] For macroblocks available for DCT domain downsampling (B frame macroblocks and low energy/edge P 
frame macroblocks), downsample and reconstruct as follows. Divide the motion vector components by 2, round up to 
the nearest half pixel, and use the previously reconstructed downsampled 8x8 blocks of I and/br P frames stored in a 
buffer to find the reference blocks. Downsample the macrobkx:ks of residuals (four 6x8 DCT blocks of residuals) in 
the DCT domain as described in the foregoing for I frame macroblocks to find the 8x8 DCT block of residuals; and 

40 apply the inverse DCT to yield the 8x8 block of residuals. Add the 8x8 block of residuals to the 8x8 reference block to 
complete the reconstruction of the 8x8 block. 

Fast DCT method applications 

45 [0045] The preceding selective decoding for high energy/edge P frame macroblocks to avoid for motion vector drift 
has the advantage of small end to end delay for each pixel and the code is simple. However, a bit more implementation 
complexity can significantly reduce the number of operations by combining fast DCT inversion methods with the pre- 
ceding selective decoding methods. 

[0046] There are many methods for performing fast DCT computation. One of the best results is achieved with the 
60 following decomposition of the 8x8 DCT matrix into a product of simpler 8x8 matrices: 



75 



(1 920/1 6)*(1 080/16)^729*30 opeiBtions/second = 2382 MIPS 



D[8] = APB^BjMA, A2A3 



SS 



where the factor matrices are: 
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It takes a total 42*8 = 336 opefatbns to do the 8 point OCT for either the rows or the cdumns. Thus the total computation 
for a two-dimensional 8x8 OCT is 672 operations. 

[0047] After applying the foregoing fast OCT on the columns and then applying the cropping matrix, only m nonzero 
rows exist The computation for the row DCT then takes only 42m operations. Also, either A^p^^i or A^ropped^ ^^uld 
be computed, so the total computation amounts to 336 + 42min(m.n). 

[0048] Now. compare the number of operations for using 8x8 inverse DCT used with and without the fast factorization 
together with cropping for DCT inversion. The number of operations is smaller vwthout the fast factorization if min(m, 
n) < 3 (equals [104 + I3max{m.n)lmin(m,n) operations) and with the last factorization for min(m,n) > 4 (equals 336 + 
42min(m.n) operations). 

[0049] Thus the worse case of the reference macroblock covering portions of nine 8x8 bkx:ks as in Figure 33 has 
the following total number of operatbns for DCT inversion. Again, without loss of generality take a + b = 9, c + d=9. 
and a < c < b ^ d; then the total number of operations is for all possible a and c values is: 



IS 
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25 



a 


c 


total operations 


1 


1 


3637 


1 


2 


3969 


1 


3 


4301 


1 


4 


3977 


2 


2 


- 4344 


2 


3 


4753 


2 


4 


4468 


3 


3 


5205 


3 


4 


4959 


4 


4 


4830 



30 



The highest number of operations is 5205, and the average is 4453. Factoring in the bilinear interpolation (64f) and 
the fonward OCT computation (672). the total computation for one macroblock is 5940 (worst case) and 51 89 (average) 
operations. 

[0050] For a 1 920x1 080 HDTV sequence (assuming no B frames), the total computation required is for the worst case: 



(1 920/1 6)*(1080/16)*5940*30 ops/sec = 1454 MIPs 



35 



and for the average case: 



(1 920/1 6)*(1080/16)*4453*30 ops/sec = 1090 MIPs 



40 



45 



With 400 MIPs. one can do selective macrobtock decoding for about 28% of all the macroblocks. Because it is unlikely 
that all the macroblocks lie on the worst case grid, the average number Is a better measure. Using the average number 
for a macroblock, one can do selective macroblock decoding for 37% of the macroblocks. If the sequence is in IBBP 
format, one should have enough computation power to perform the tnvense motion decoding for almost 100% of the 
macroblocks for all P frames and thereby avoid motion vector drift. 



60 



Interlaced field downsampling 

[0051] For interlaced field format, denote the even and odd numbered lines of the macroblock P and P^ and po. 
respectively. Thus P^ and po and 8x16 fields, and each can be considered as made of two blocks: P^ = Pq^ + 
and po = Pq^ + Pi^; this is analogous to the foregoing decomposition of P into four blocks. Then downsample the 
rows of P^ and P^ as previously: 



55 



.p,VandP° 



Where P^down P^down are 8x8 blocks. 

The 8x8 DCT of P^own. ^^e 8x8 downsampled P. can be written as the average of the P^^y^ and P*^down 
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The whole procedure for one macroblock requires computing two nnatrix multiplications, which take 336*2 = 672 op- 
5 erations. The averaging takes another 64 operations (scaling will be done at the end). The total count is 736 operations 
. per macroblock. Therefore, field macroblocks can be.downsampled with fewer operations than 16x16 macroblocks. 

Set-top box 

10 [0052] A preferred embodiment iset-top box illustrated in Figure 3 includes the demodulation (tuner. PLL synthesis, 
IQ demodulation. ADC, VLD. FEC) and MPEG-2 decoding of an incoming high resolution signal. The MPEG-2 decoder 
uses the preferred embodiments of the foregoing descnption. 

Further details of the downsampling plus a repacking of chrominance blocks tor easy inverse OCT follows. Also, a 
description of a decoder (AV310) Is appended. ... 

IS [0053] Aspects of the present invention include methods and apparatus for tvansooding and decoding a frequency 
domain encoded HDTV data stream for presentation on a standard deifinition television. In the following descriptk^n. 
specific information is set forth to provlde.a thorough understanding of the preserit invention. Well-known circuits and 
devices are included in block degrarn form in .order not to complicate the description unnecessarily. Moreover, it will 
be apparent to one skilled in the art that specific details of these blocks are not required jn order to practice the present 

20 invention. 

[0054] Figure 22 is a block diagram showing a transcoder 1 000 and an SDTV decoder 2000 according to the present 
invention connected to a standard definition television set 3000. A frequency domain encoded data stream 990 is 
connected to an input terminal of transcoder 1000. Data stream 990 is encoded according to the MPEG standard, 
which is well known, and contains both an audio data stream and a video data stream. The video data stream contains 

25 frequency domain encoded data which represents a high definitksn television (HDTV) picture. 

[0055] Figure 23A and 238 is a flowchart illustrating a trariscoding process and a decoding process according to the 
present invention. Figure 23A illustrates the transccxling process performed by transcoder 1000. An MPEG transport 
stream is provided to input "A." A parse block examines the MPEG transports stream and extracts a video data stream, 
which is encoded according to the MPEG standard. A "find header" block then synchronizes to the video data stream 

SO and extracts a set of macro bkx:ks. Each macro block is a frequency domain encoded representation of a 1 6 x 1 6 pixel 
region from in a picture frame. A complete HDTV picture frame has 1920 x 1050 pixels. A "VLD* btock then performs 
a variable length decode on each macro block to obtain four luminance subblocks and two chrominance subbkx:ks. 
Each set of luminance subblocks is downsampled by 2:1 in both an x and a y direction to get a total reduction of 4:1 . 
Each chrominance subblock is downsampled in one direction to get a 2:1 reduction. Advantageously, and according 

35 to the present invention, the downsampling step is done in the frequency domain. 

[0056] Still referring to Figure 23A, bk>ck VLC now encddes the six subblocks formed by the downsampling step with 
a variable length code to form a new macro bkxk that represents an 8 x 8 pixel regran. In this manner, an HDTV picture 
frame with a resolution of 1920 x 1050 is transcoded to a pseudo SDTV picture frame with a resolution of 960 x 540 
pixels. Next, the video data stream is now reconstructed using the macro blocks formed by the downsampling step 

40 and combining them with header information from the original data stream that has been edited to reflect the current 
format of the video data stream. Finally, the transport stream is reconstructed by combining the reconstructed video 
stream with the audio data stream. This reconstructed MPEG transport stream Is advantageously compatible with any 
fully compliant MPEG decoder and is provided on output "B!" 

[0057] Figure 23B illustrates the decoding process. The reconstructed MPEG transport stream is decoded and con- 
45 verted to spatial domain data stream that conforms to the NTSC format and provided on output "C." An NTSC picture 
frame can be represented as a picture frame with 720 x 480 pixels, as illustrated in Figure 24. 

[0058] Figure 25 Figure 26 are a flow diagrams which illustrate the operation of the transcoder and decoder of Figure 
22. Three macro blocks are processed at a time. Each macro block has a 4:2:0 format and represents a picture frame 
which has a resolution of 1920 x lOSO. All three are downsampled in the frequency domain and then combined in 
50 reconstruction block 1 01 5 (Figure 23A) while still in the frequency domain to form a single new macro block which has 
a 4:2:2 format and represents a picture frame which has a resolution of 960 x 540. Thus, each new macro block 
represents three scaled original macro blocks. 

[0059] Figure 27 illustrates the effect of transcoding according to the present invention . According the MPEG 2 spec- 
ification, an HDTV source picture is represented in the spatial domain by a number of 16 x 16 blocks of luminance 
55 values, one for each pixel. Block 1 050 is one such block of luminance values. Block 1 050 is composed of four subblocks; 
bij, dj. dij and elj. In order to reduce the resolution of an HDTV frame for display on a standard definition TV. It would 
be desirable to filter block 1050 to obtain an equivalent block which represents only 8x8 pixels. However, this cannot 
be done directly since the MPEG2 encoding process transmits a frequency domain block 1051 that is formed by an 
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IDCT In block 1 051 . the four subblocks are now frequency domain blocks Bij. Cij, Dij, and Eij. According to the present 
invention, a downsampling is performed in the frequency domain, so that block 1051 does not need to be converted 
to the spatial domain by performing a confute intensive DCT Thus, the resulting block 1052 is a frequency domain 
block that represents 8x8 pixels and is a function of Bij, Cij, Dij, and Eij. 
5 [0060] According to MPEG2, a video sequence is represented by a series of I frames interspersed with P frames 
and B frames. An I frame contains a complete picture frame, while B frames and P frames contain motion vectors and 
sparsely populated arrays of image data. According to the present Invention, motbn vectors are also scaled down 
corresponding to the downsampling of the image data. 

[0061] The technique for downsampling the luminance and chrominance image data in the frequency domain will 
10 now be described In detail. 

Luminance Downsampling in the DCT domain 

[0062] Note for all calculations the scale factor is ignored to reduce complexity. Small letters a. b, c. d.f indicate 
IS spatial domain coefficients and capital letters A, B, C, D, E indicate frequency (DCT) domain coefficients. 

[0063] Presume a 1 6x1 6 block made up of four 8x8 blocks as shown in Figure 27. the four 8x8 blocks have coefficients 

b(i,j). c(i.j). d(i,j). eCi.fl. with 0 < ij ^ 7. respectively, and the combined 16x16 has coefficients a(i.j) with 0 < i.j < 15. 

Thus. a(i,j) = b(i.j) for 0 ^ i,j ^ 7; a(i,j) = c(l,j-8) for 0 ^ i < 7 and 8 ^ j S 15; a(i.D = d(i-8.j) for 8 < i < 15. 0 ^ J ^ 7; and a 

{iJ)=:e(i-8.j-8)forB<l,j^15. 
20 . The 8x8 DCT of the four 8x8 blocks gives coefficients: 

B(u.v) = £L b(i,j) cos({2i+l)uic/16] cos[{2j+1)V7c/l6] 

. E(u.v) = XZ e(i,j) cos[(2i+1 )uic/1 6] cos[{2i+1 )vn/1 6] 

where the siirns are over 0 ^ i S 7 and 0 ^ j ^ 7. Similarly. 
30 ' 

A(i}y) = ZZ a(i.j) cos[(2i+1 )uic/321 cos[(2j+1 )VJi/32] 

where the sums are over 0^1^15 and 0 < j ^ 15. 

^ =ZZ a(ij)cos[{2i+1)u7t/16]cos[(2j+1)V7c/16] 

. [0064] For even terms: 

^ A(2u.2v)=ZZa(i.j)cos[(2i+1)2u7c/32]cos((2j+1 )2v7c/32]-i- 

ZZ a(i.j+8)cos[(2i+1)ujc/16Jcos[(2(j+8)1)V7t/16] + 
^ ZZ a(i+8 J)cos[(2(i4a)+1 )U7c/1 61cos[(2i+1 )V7t/1 6] + 

ZZa (i48.j+8)cos[(2(i+8)+1 )uic/1 61cos[(2a+8)+1 )vjc/1 6] 

where the first sums over 0 < i ^ 15 and 0 < j < 1 5 has been broken up into four sums, each over 0 < i ^ 7 and 0 ^ j 5 
so 7. Using the cos(x + nn]=cosx (-1)" yields 

A(2u.2v)=ZZ a(i,j)cos[(2i+1 )un/16]cos[(2j+l)vjt/16] + 
55 ZZ a(i,j)cos[(2i+1 )U7c/1 6Jcos[{2j+1 )V7c/1 6] -1 f + 

ZZ a(i,i)cos[(2i+1 )utc/1 6] (-1 )" cosl(2j+1 )vjc/1 6] + 
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52 a(i.j)cos[(2M.1)U7c/16] (-1)" cos[(2j+1)vn/16J (-1)'' 

Henc8.A(2u.2v)=B(u.v)+(-1 rC(u.v)+{-1 )"D(u.v)+(-v+"E(u,v) 
[0065] For odd terms 

A(2u+1 .2v+l ) = ZZ a(i.j) cos[(2i+1 ) {2u+1 )n/32] cos[(g+l ) (2v+l ) jc/32] 
= ZZ b(l.i) cos[(2i+1) (2u+1)ti/32] cos[(2j+1) (2v+1)n/32]+ 
ZZ c(i.i) cos[(2i+1) (2u+1)7i/32] cos[(204B)+l) (2v+1)ii/321 + 
ZZ d(i,j) cos{(2(i+8)+1) (2u+1)re/32] co6[(2j+1) {2v+1)it/32J + 
ZZ e(i.j) cos[{2(i+8)+1) (2u+l)7c/32] cos[(2(j+8)+1 ) (2v+1)7i/32] 

where the first sums over 0 < i < 15 and 0 < j < 1 5 has been broken up into four sums, each over 0 ^ i ^ 7 and 0 ^ j ^ 
7. Substituting in the inverse DCTs for the spatial coefficients yields: 

A(2u+1.2v+1) = ZZ [ZZ B(m.n) C9s({2l+1 )m7t/1 6] cos[(2j+1)nit/16]] 
cos[{2i+1 )(2u+1 )7c/32] cos[(2j+1 )(2v+l )n/32] + 
ZZ [ZZ C(m,n) cos[(2i+1)m7c/16] cos[(2i+1)n7c/16]] 
cos((2i+1)(2u+l)ji/32] cos[(20+8)+1)(2v+1)n/32] + 
ZZ [ZZ D{m,n) cos[{2i+1 )mn/1 6] cos[(2j+1 )nn/1 6]] 
cos((2(i+8)+1)(2u+1)ii/32] cos[(2j+1 )(2v+1 )ii/321 + 
ZZ [ZZ E(m.n) cos[(2i+1)mn/16] cos[(2J+1 )n jc/1 6]] 
cosI(2(i-fr8)+1 )(2u+1 )jc/32] cos((2a+8)+1 )(2v+1 )7t/32] 

with the interior sums over 0 ^ m < 7 and 0 ^ n < 7 
[0066] Switch order of summation: 

A{2u+1,2n+1) = ZZ B(m.n)ZZ cos[(2i+1)mii/16l cos[(2j+1)n/16] 
cos[(2i+1 )(2u+1 )w/32] cos[(2j+1 )(2v+1 )ic/32] + 
ZZ C(m.n)ZZcos()cos()cos()cos() + 
ZZ D(m,n)ZZcos()cos()cos()cos() + 
ZZ E(m,n)ZZcos()cosOcos()cos() 
So A(2u+1,2n+i) = ZZ B(m,n)B^m,n.u,v) + 
ZZ C(m.n)<>(m.n.u.v) + 
ZZ D(m.n)D^m,n.u.v) + 
ZZ E(m,n)E'Km.n,u,v) 
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where 



B^m.n.u.v) = 

XEoos[(2i+1)mn/16] cosI(2j+1)nic/16]] cos[(2i+1) (2u+1)n/32] 
os[(2j+1)(2v+l)ii/32) 
C'Hnri.n.u.v) = 

Z£ cos((2i+1)m7i/161 cos[(2j+1 )nK/1 6]] cos[(2i+1)(2u+1)Tc/32] 
cos((2(j+8)+1 )(2v+1 )«/321 
D^m.n.u.v) = 

Zr cos[(2i+1)mn/1Bl cos[(2i+1 )n7i/161] cos[(2(i+8)+1 )(2u+1 )tc/32J 
cos[(2i+1)(2v+1)n/32] 
E'^nn.n.u.v) = 

XL cos[(2i+1 )m7i/1 6] cos[(2j+1 )n7i/1 6]] cos[(2(i+8)+1 )(2u+1 )ic/321 
cos[(2(j+8)+l )(2v+1 )it/32J 



[0067] Taking just the lower frequency 8x8 block of A (which corresponds to 0 < u ^ 3 and 0 < v < 3 in the foregoing 
expressions for A(2u,2v) and A (2u+1 ,2v+1 )) provides the downsampling in the OCT domain. An 8x8 inverse DCT on 
30 this 8x8 block of A yields the spatial downsample. 

Chrominance downsampling in the DCT domain 

.[0068] The two 8x8 chrominance blocks of a macroblock may be downsampled by a factor of 2 in the OCT domain 
35 and repacked to form a single 8x8 block. Then an inverse DCT on this repacked 8x8 block will recover the two 8x4 
downsampled spatial chrominance blocks. See Figure 27b and the following calculations with 8x4 B(u,v) denoting the 
low frequency half of 8x8 Cb DCT and 8x4 C(u,v) the low frequency half of 8x8 Cr DCT. Let b(i,j) and c(i.j) be the two 
8x4 inverse DCTs of B(u.v) and C(u,v). respectively; so b and c are the downsampled spatial chrominace. 
[0069] Let a(ij) = b(i.j) for 0 ^ i S 7 and 0 ^ j ^ 3 and a(i.j) = c(i j-4) for 0 ^ i ^ 7 and 4 ^ j ^ 7. 



A(u,v) = ZZ a (i.]") cos[(2i+1)un/16] cos[(2j+f )v«/1 6] 



where the sum is over 0 ^ i ^ 7 and 0 < j < 7. 
45 [0070] Split the sum into two sums corresponding to 0 ^ i ^ 3 and 4 £ j ^ 7 and denote the sum over 0 ^ i ^ 7 and 0 
< j ^ 3 as A^(u.v) and the sum over 0 < i < 7 and 4 < j < 7 as A2(u.v). Thus A(u,v) = A^u.v) + A2(u,v). 
[0071] Insert the definition of a(i.j) in terms of b(i.j) and c(i.j), and b(i j) and c(i.j) in terms of B(m.n) and C{m,n) into 
these sums: 

A\u.y) = 

ZZ[ZZB(m.n)cosl(2i+1 )mn/161cosl(2j+1 )nic/16l]cosI(2i+1 )un 
/16]cos((2j+1)vn/16] 



where the sums are overO < i <7, 0<j<3, 0<m<7. 0<n<7. 
[0072] Reordering the sums yields: 
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"* ; A* (u;v)'= X2:B(u.n)cos[(2j+1)nii/16] cos[(2j+1)vir/l6] 

where B(u.n) = 22:B(m.n)cos[{2i+1)nrm/16] cos((2i+l)ujt/16). Thus A^u.v) = 2B(u.n)B*(v.n) 
5 where B*(v.n) = £cos[(2j'f 1 )nii/1 6] cos[(2j+1 )vti/1 6]. Similarly for A^^ 

A^(u,v) = 

i2:[XXC(m.n)cosI(2i+1 )mjc/16]cos[(2j+1 )nii/16]]cos[(2i+1 )un 
/16]cos[(2j+9)v7i/16] 

where the sums are overO ^7, 0<j^3. 0^m^7. 0^n<7. 
IS [0073] Reordering the sums yields: 

aVu.v) = 2rC(u.n)cos((2j+1)n7c/16] cos[(2j+9)VK/16] 

20 where C(u.n) = 23:c(m.n)cos[(2i+1 )nrm/1 6] cos[(2i+1 )ujc/1 6]. Thus A2(u,v) = EC{u,n)C*(v.n) 
where C*(v.n) = i:cos[(2j+1 )mi/16] cos[(2j+9)v7i/16J. 
Combining: A(u,v) = Z iB(u,n)B*(v,n) + c{u,n)c*(v,n)l. 

[0074] Note that in the definition of C* the terms include cos[(2j+9)vn/16] which can be expanded: 

2S 

cos[{2j+9)vrc/16] = cos[(2j+1 )V7i/1 6 + vic/2] 
cos[(2j+1 )V7i/16]cos[V7c/2l+sin[(2j+1 )v7c/1 6]sin[vic/2] 

30 

and 

SDn[V7i^] = 0, 1 , 0. -1 ; ... lor V = 0, 1 , 2. 3, ... 

3S . 

cos[vjc/2] = 1 . 0, -1 . 0... for V = 0, 1 . 2. 3, ... 

. Thus for even v: 

A2(u,v) = ± rC(u,n)J:cos[(2j+1 )n3t/16] cos[(2j+1 )v7i/1 6] with the + sign for v = 0 and 4 and the - sign for v = 2 and 
40 6. Note that the sum of cosines is just B*{v,n). 

[0075] Combining: A(u,v) = X [B(u.n) + C(u,n)] B*(v,n) for v even, which reduces the computation compared to the 
general expression for A(u,v). . 

Reduction and control of computation rate \ \ . . 

[0076] Other than even terms of luminance, other computations are in the fomri of 

XX A(u,v)A*(u.v) + ZX B(u,v)B*(u.v) 

so 

with the A(u,v) and B(u,v) terms In the frequency (DCT) domain and most of the higher order terms will be zero. We 
can sum the terms In zigzag order and the average number of nonzero terms for an 6x8 block are about 20. During 
variable length decoding stage we know the number of nonzero terms, and the highest terms which are not zero in 
zigzag order. Monitoring process to detect cases of an abnomial number of nonzero terms by checking amount of time 
ss and blocks needed to be processed remaining and start truncation of higher frequencies. 

[0077] Figure 28 is a block diagram illustrating the transcoder and decoder of Figure 22 in more detail. Preprocessor 
1100 performs the computations described above one each rnacro block. DRAM 1110 provides storage for a portion 
of the data stream. Preprocessor 1100 forms two streams of downsampled data, IN_A and IN.B that are passed to 
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two MPEG decoder circurls, 2010 and 201 1 , respectively. Two prpcessors.are used in order to provide sufficient com- 
putational resources to decode and filter the pseudo SDTV data stream. These processors are described in detail with 
respect to Figures 1-21. It should be noted that this Is not a limiting aspect of the present invention. A single decode 
circuitry with sufficient computing power can replace circuits 2010 and 2011 . 
5 [0078] Advantageously, each processor circuit 2010/2011 needs to decode only one half of the B frames. Each 
processor circuit is provided with all of the I frames and all of the P frames so that any B frame can be decoded by 
either processor. Mux 2020 is controlled to select a correct order of display frames which are output on OUT_A and 
OUT_B. 

[0079] The normal bitstream has the following decoding sequence for I (intra), P (predicted) and B (bi-directional 
10 predicted) pictures: 

Decoding sequence: Iq P3 B, Bg Pg B4 Bg Pg Bj Bg P^^ ^10 

IS After preprocessor 

IN.A has: 1^ P3 B, Pg B^ Pg B7 P^^ B^g ... 



IN^B has: Ig P3 Bg Pg Bg Pg Bg P,2 B„ ... 

with three frames time decoder A decodes P3 B^ and decoder B decodes Pa Bg. 
[0080] Display sequence: 

2$ 

OUT^A:lg B, B^ Pg B7 B,o P,2 B,3 

OUT^B:B2P3B5BgP3B,, ... . 
For each decoder, every six frames time displays three pictures. 

[0081] Figure 29 is a block diagram of the transcoder of Figure 22. Transcoder 1000 has three processing units 
1200-1202 that are essentially identical. Each processing unit has four arithmetic units. A dual port RAM 1300 is or- 
35 ganized so that while one half is being written with new data from the incoming MPEG macro blocks, the other half Is 
accessed by the four arithmetic units. CPU 1400 performs steps 1010-1012 (Figure 23A) and provides macro blocks 
to each dual port RAM 1 300. 

[0082] Processors 2010 and 2011 will now be described in more detail. In the folbwing descriptions, references to 
* AV310 refer to processors 2010 and 2011. 

40 [0083] Referring now to Figure 1 there may be seen a high level functional block diagram of a circuit 200 that forms 
a portion of an audio-visual system of the present invention and its interfaces with off-chip devices and/or circuitry. 
More particularly, there may be seen the overall functional architecture of a circuit including on-chip interconnections 
that Is preferably implemented on a single chip as depicted by the dashed line portion of Figure 1 . 
[0084] As depicted inskJe the dashed line portion of Figure 1 , this circuit consists of a transport packet parser- (TPP) 

45 bbck 210 that includes a bitstream decoder or descrambler 212 and clock recovery circuitry 214, an ARM CPU block 
220, a data ROM block 230, a data RAM block 240. an audioA/ideo (AA/) core block 250 that includes an MPEG-2 
audio decoder 254 and an MPEG-2 video decoder 252, an NTSC/ PAL video encoder block 260, an on screen display 
(OSD) controller block 270 to mix graphics and video that includes a bitbit hardware (H/W) accelerator 272, a commu- 
nk:ation co-processors (CCP) block 280 that includes connections for two UART serial data interfaces, infra red (IR) 

so and radio frequency (RF) inputs, SIRCS input and output, an I2C port and a Smart Card Interface, a PI 394 interface 
(l/F) block 2990 for connection to an external 1394 device, ah extension bus interface (l/F) block 300 to connect pe- 
ripherals such as additional RS232 ports, display and control panels, external ROM. DRAM, or EEPROM memory, a 
modem and an extra peripheral, and a traffk; controller (TC) block 31 0 that includes an SRAM/ARM interface (l/F) 31 2 
and a DRAM l/F 314. There may also be seen an internal 32 bit address bus 320 that interconnects the blocks and an 

55 internal 32 bit data bus 330 that interconnects the blocks. Extemal program and data memory expansion allows the 
circuit to support a wide range of audio/video systems, especially, for example, but not limited to, set-top boxes, from 
low end to high end. 

[0085] The consolidatton of all these functions onto a single chip with a large number of inputs and outputs allows 
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for ramoval of excess circuitry, and/or logic needed for control and/or communications when these functions are dis- 
tributed among several chips and allows for simplificatiori of the circuitry remaining after consolidation onto a single 
chip. More particularly, this consolidation results in the elimination of the need for an extemal CPU to control, or coor- 
dinate control, of all these functions. This results in a simpler and cost-reduced single chip implementation of the 

s functionality currently available only by combining many different chips and/or by using special chipsets. However, this 
circuit, by its very function, requires a terge number of inputs and outputs, entailing a high number of pins for the chip. 
[0086] In addition, a JTAG block is depicted that allows for testing of this circuit using a standard JTAG interface that 
is interconnected with this JTAG block. As more fully described later herein, this circuit is fully JTAG compliant, with 
the exception of requiring extemal pulNup resistors on certain signal pins (not depicted) to permit 5v inputs for use in 

10 mixed voltage systems. 

[0087] In addition. Figure 1 depicts that the circuit is interconnected to a plurality of other extemal blocks. More 
particularly, Figure 1 depicts a set of external memory blocks. Preferably, the extemal merrxjry is SDRAM, although 
clearly, other types of RAM may be so employed. The extemal memory 300 is described more fully ^ter herein. The 
incorporation of any or all of these external blocks and/or all or portions of the extemal memories onto the chip is 

IS contemplated by and within the scope of the present invention. 

[0088] Referring now to Figure 2, it may be seen how the circuitry f AV310) accepts a transport bitstream from the 
output of a Forward Error Correction (FEC) device with a maximum throughput of 40 Mbits/s or 7.5 Mbytes/s. The 
Transport Packet Parser (TPP) in the 'AV310 processes the header of each packet and decides whether the packet 
should be discarded, further processed by ARM CPU. or if the packet only contains relevant data and needs to be 

20 stored without intervention from the ARM. The TPP sends all packets requiring further processing or containing relevant 
data to the internal RAM via the Traffic Controller (TC). The TPP also activates or deactivates the decryption engine 
(DES) based on the content of an individual packet. The conditional access keys are stored in FIAM and managed by 
special firmware running on the ARM CPU. The data transfer from TPP to SRAM is done via DMA set up by the Traffic 
Controller (TC). . * 

2S [0089] Further processing on the packet is done by the ARM firmware, which is activated by interrupt from the TPP 
after the completion of the packet data transfer. Two types of transport packets are stored in the RAM and managed 
as a first-in first-out (FIFO). One is for pure data which will be routed to SDRAM without intervention from the ARM, 
and the other is for packets that need further processing. Within the interrupt service routine, the ARM checks the FIFO 
for packets that need f.urther processing, performs necessary parsing, removes the header portion, and establishes 

30 DMA for transferring payload data from RAM to SDRAM. The Traffic Controller repacks the data and gets rkj of the 
voids created by any header, removal. 

[0090] Together with the ARM. the TPP also, handles System Clock Fteference (SCR) recovery with an extemal 
VCXO. The TPP will latch and transfer to the ARM its intemal system clock upon the arrival of any packet which may 
contain system clock information. After further processing on the packet and identifying the system clock, the ARM 
3S calculates the difference between the system clock f i-om a bitstream and the actual system clock at the time the packet 
arrives. Then, the ARM filters the difference and sends it through a Sigma-Delta DAC in the TPP to control an extemal 
voltage controlled oscillator (VCXO) During start-up when there is no incoming SCR, the ARM will drive the VCXO to 
its center frequency. 

[0091 1 The TPP will detect packets lost from the transport stream. With error concealment by the audioA/ideo decoder 

40 and the redundant header from DSS bitstream, the *AV310 minimizes the effect of lost data. 

[0092] After removing packet headers and other system related Information, both audio and video data is stored in 
extemal SDRAM. The video and audio decoders then read the bitstream from SDRAM and process it according to the 
ISO standards. The chip decodes MPEG-1 and MPEG-2 main profile at main level for video and Layer I and 11 MPEG- 
1 and MPEG-2 for audio. Both Video and Audio decoders synchronize their presentation using the transmitted Pres- 

^ entation Time Stamps (PTS). In a Digital Satellite System (DSS), the PTS is transmitted as pteture user data in the 
vkieo bitstream and an MPEG-1 system packet bitstream for audio. Dedicated hardware decodes the PTS if it is in the 
MPEG-1 system packet and forwards it to the audio decoder. The video decoder decodes PTS from picture user data. 
Both Video and Audio decoders compare PTS to the local system clock in order to synchronize presentation of recon- 
structed data. The local system clock is continuously updated by the ARM. That is, every time the System Clock 

so Reference of a selected SCID is received and processed, the ARM will update the decoder system dock. 

[0093] The Video decoder is capable of producing decimated pictures using 1/2 or 1/4 decimation per dimensbn, 
which results in reduced areas of 1/4 or 1 /1 6. The decimated picture can be viewed in real time. Decimation is achieved 
by using field data out of a frame, skipping lines, and performing vertical filtering to smooth out the decimated image. 
[0094] When decoding a picture from a digital recorder, the decoder can handle trick modes (decode and display I 

ss frame only), with the limitatkjn that the data has to be a whole picture instead of several intra slices. Random bits are 
allowed in between trick mode pictures. However, if the random bits emulate any start code, it will cause unpredictable 
decoding and display errors. 

[0095] Closed Caption (CC) and Extended Data Services (EDS) are transmitted as picture layer user data. The video 
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decoder extracts the CC and EDS information from the video bitsVeam ar^^^^^ encoder module. 

[0096] The video decoder also extracts the aspect ratio f rbni. the bltstream ajid senclf it tb the' ARM which prepares 
data according to the Video Aspect Ratio Identification Signal (VARISy standard. EIAJ CPX - 1204. The ARM then 
sends it to the NTSC/PAL encoder and OSD module. 

5 [0097] The OSD data may come from the user data in the bitstream or may be generated by the application executed 
on the ARM. Regardless of the source, the OSD data will be stored iri the' SDRAM and managed by the ARM. However, 
there is only iimited space in the SDRAM for OSD. Applications that require large quaritrties of OSD data have to store 
them in an external memory attached to the Extension Bus. Based on t^e request frcfn the application, the ARM will 
turn the OSD function on and specify how and where the OSD will be mfxed and displayed along with the normal video 

10 sequence. The OSD data can be represented in one of the following forms; bitmap, graphics 4:4:4 component. CCIR 
601 4:2:2 component, or just background color. A special, dedicated bitBLT hardviafe expedites memory block moves 
between different OSDs. . J , 

[0098] The conditional access is triggered by the arrival of a Cwitroi Word Packet (C WP). The ARM firmware rec- 
ognizes a CWP has been received and hands it to the Verifier, which is NewsDat^Com (NDC) application softvvare 
IS running on the ARM. The Verifier reads the CWP and communicates with the external Sr^ Card through a UART 1/ 
O interface. After verification, it passes the pointer to an 8 byte key back tajthe firinWaire/which then loads the key for 
the DES to decrypt succeeding packets. 

[0099] The 32-bit ARM processor running at 40,5 MHz and its associated firmWaire provkJe the following: initialization 
and management of all hardware modules; service for selected interrupts generated by hardware modules and I/O 

20 ports; and application program interface (API) for users to develop their own applications. 

* [0100] All the firmware will be stored in the on-c|iip 1 2K Isytes ROM, except. the OSD graphics and some generic run 
time support The 4.5K bytes on-chip RAM provjdes the space necessary for the ^AV310 to properly decode transport 
bitstreams without losing any packets. The run-time support library (RTSll) and all user application software are located 
outside the 'AV310. Details of the firmware and RTSL are provided In the companion software specification document. 

25 [0101] There are two physical DMA channels managed by the Traffic Corit roller to facilitate large block transfers 
between memories and buffers! that is, as long as. there is rio collision in the source and destination, it is possible to 
have two concurrent DMA transfers. The detailed description of DMA is provided iri the section on the Traffic Controller. 
[0102] The 'AV310 accepts DSS transport packet data from a front end such as a forward error correction (FEC) 
unit. The data is input 8 bits at a time, using a byte clock. DC LK, PACCLK high signals valid packet data. DERROR is 

30 used to indicate a packet that has data errors. The timing diagram in Figure 3 shows the input timfng. 

[0103] The ' AV310 includes an interface to the Srnart Card access control system. The interface consists of a high 
speed UART, logic to comply with the News Datacom specification (Document # HU-Tq52. Release E dated Noveml?er 
1994, and Release F dated January 1996) "Directv Project: Decoder^mart Card Interface Requirements.' Applicable 
software drivers that control the interface are also included, and are shown in the companion' software document. 

35 [0104] It should be noted that the ' AV310 is a 3.3 volt device, while the Smart Card requires a 5 volt interface. The 
*AV310 will output control signals to turn the card's VCC and VPP on and off as required, but external switching wvill 
be required. It is also possible that external level shifters may be needed on some of the logic signals. 
[01 05] A NTSC/PAL pin selects between an NTSC or a PAL output. Changing between NTSC and PAL mode requires 
a hardware reset of the device. 

40 [0106] The *AV31 0 produces an analog S-video signal on two separate channels, the luminance (Y) and the chromi- 
. nance (C). It also outputs the analog composite (Comp). signal. All three outputs conform to the RS170A standard. 
[01 07] The * AV31 0 also supports Closed Caption and Ejrtended Data Services. The analog output transmits CC data 
as ASCII code during the twenty-first video line. The NTSC/PAL encoder module inserts VARIS codes into the 20th 
video line for NTSC and 23rd line for PAL. 

45 [01 08] The digital output provides video in either 4:4:4 or 4:2:2 component format, plus the aspect ratio VARIS code 
at the beginning of each video frame. The video output format is programmable by the user but defaults to 45:2..The 
content of the video could be either pure video or the blended combination of video and OSD. . 
[01 09] The pin assignments for the digital video output signals are: 

so YCOUT(8) 8-bit Cb/Y/Cr/Y and VARIS multiplexed data output 

YCCLK(I) 27 MHz or 40.5 MHz ctock output 
YCCTRL(2) 2-bit control signals to distinguish between Y/Cb/Cr 

components and VARIS code 
ss [0110] The interpretation of YCCTRL is defined in the following table. 
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Table 1. 



Digital Output Control 


SIGNALS 


YCCTRL(1] 


YCCTRLIO] 


Component Y 


0 


0 


Component Cb 


0 


1 


Component Cr 


1 


0 


VARIScode 


1 


1 



[0111] The aspect ratio VARIS code includes 14 bits of data plus a 6-blt ORG, to nnake a total of 20 bits. In NTSC 
the 14-bit data is specified as shown in Table 2 

Table 2. 



VARIS Code Specification 




Bit Number 


Contents 


WordO A 


1 


Communication aspect ratio: 1 = full mode (16:9). 0 = 4:3 




2 


Picture display system: 1 = letter box, 0 = normal 




3 


Not used 


WordO 
B 


4 
5 
6 


Identifying information for the picture 

and other signals (sound signals) that 

are related to the picture transmitted simultaneously 


Wordl 


4-bit range 


Identification code associated to WordO 


Word2 


4-bit range 


Identification code associated to WordO and other infornr^tlon 



[0112] The 6-brt CRC is calculated, with the preset value to be all 1 , based on the equation G(X) = + X + 1. 
[0113] The 20-bit code is further packaged into 3 bytes according to the following format illustrated in Table X. 

Table 3. 



Three Byte VARIS Code 




b7 


be 


b5 b4 


b3 


b2 b1 . bO 


1st 
Byte 






WordO B 


WordO A 


2nd 
Byte 


Word2 


Wordl 


3rd 
Byte 


VID_E 
N 




CRC 



[01 1 4] The three byte VARIS code is constructed by the ARM as part of the initialization process. The ARM calculates 
two VARIS codes corresponding to the two possible aspect ratios. The proper code is selected based on the aspect 
ratio from the bitstream extracted by the video decoder The user can set VID^EN to signal the NTSC/PAL encoder to 
enable (1) or disable (0) the VARIS code. The transmission order is the 1st byte first and it is transmitted during the 
non-active video line and before the transmissbn of video ci^ta. 

[0115] The timing of the VARIS output is shown in the following Figure 4. The'timtng of 4:2:2 and 4:4:4 digital video 
output is shown in Figure 5. 

[0116] The PCM audio output from the 'AV310 is a serial PCM data line, with associated bit and left/right clocks. 
[0.11 7] PCM data is output serially on PCMOUT using the serial clock ASCLK. ASCLK is derived from the PCM clock, 
PCMCLK, according to the PCM Select bits In the control register. PCM clock must be the proper multiple of the 
sampling frequency of the bitstream. The PCMCLK may be input to the device or internally derived from an 18.432 
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MHz clock, depending on the state of the PCM_SRC pin. The data output of PCMOUT alternates between the two 
channels, as designated by LRCLK as depicted in Figure 6. The data is output most significant bit first. In the case of 
18-btt output, the PCM word size is 24 bits. The first six bits are zero, followed by the 18-bit PCM value. 
[0118] The SPDIF output conforms to a subset of the AES3 standard for serial transmission of digital audio data. 

5 The SPOIF format is a subset of the minimum implementation of AES3. 

[0119] When the PCM_SRC pin is low. the ' AV310 generates the necessary output clocks for the audio data, phase 
locked to the input bitstream. The clock generator requires an 1 8.432 MHz external VCXO and outputs a control voltage 
that can be applied to the external loop filter and VCXO to produce the required input. The clock generator derives the 
correct output clocks, based on the contents of the audio control register bits PCMSEL1-0. as shown in the following 

10 table. 



Table 4. 



IS 



.20, 



Audio Clock Frequencies 


PCMSEL1-0 


Descriptbn 


LRCLK (KHz) 


ASCLK (MHz) 


PCMCLK (MHz) 


00 


16 bit PCM, no oversampling 


48 


1 .5360 


1.5360 


01 


16 bit PCM, 256 x oversampling 


48 


1.5360 


12.288 


10 


18 bit PCM, no oversampling 


48 


2.304 


2.304 


11 


18 bit PCM, 384 x oversampling 


- 48 


2.304 


16.432 



. [0120] Maximum clock jitter will not exceed 200 ps RMS. An example circuit is shown in Figure 7. 
(0121) When PCM_SRC is high, the ' AV310 expects the correct PCM oversampling clock frequency to be input on 
PCMCLK. • 

[0122] the SDRAM must be 16-bit wide SDRAM.;Th.e 'AV310 provides control signals for up to two SDRAMs. Any 
combination of 4, 8, or 16 Mbit SDRAMs may be used, provided they total at least 16 Mbtts. The SDRAM must operate 
at an 81 MHz clock frequency and have the same timing parameters as the Tl TMS626162, a 16 Mbit SDRAM. 
[0123] The extension bus interface is a 16-bit bi-directional data bus with a 25-bit address for byte access. It also 
provides 3 external interrupts, each with it's own acknowledge signal, and a wait line. All the external mennories or 1/ 
O devices are mapped to the 32-bit address space of the ARM. There are seven internally gene4ated Chip Selects 
(CSx) for EEPROM memory, DRAM, modem, front panel, front end control, parallel output port, and 1 394 Link device. 
Each CS has its own defined memory space and a programmable wait register which has a default value 1 . The number 
of wait states depends on the content of the register, with a minimum of one wait state. The EXTWAIT signal can also 
be used to lengthen the access time if a slower device exists in that memory space. 

[01 24] The Extension Bus supports the connection of 7 devices using the pre-defined chip selects. Additional devk^es 
may be used by externally decoding the address bus. The following table shows the name of the device, its chip select, 
address range, and programmable wait state. Every device is required to have tri-stated data outputs within 1 clock 
cycle following the removal of chip-select. 



Table 5. 



45 



SO 



Extension Bus Chip Select 


Chip Select 


Byte Address Range 


Wait State 


Device 


CS1 


0200 0000 - 3FF FFFF 


1-5 


EEPROM(up to 32 MBytes) 


CS2 


0400 0000 - 05FF FFFF 


N/A 


DRAM (up to 32 MBytes) 


CS3 


0600 0000 - 07FF FFFF 


1 -7 


Modem 


CS4 


0800 0000 - 09FF FFFF 


1 -7 


Front Panel 


CS5 


OAOO 0000 - OBFF FFFF 


1-7 


Front End Device 


CS6 


OCOO 0000 - ODFF FFFF 


1 -7 


1 394 Link Device 


CS7 


OEOO 0000 - OFFF FFFF 


1 -4 


Parallel Data Port 



[0125] CS1 is intended for ARM application code, but writes will not be prevented. 

[0126] CS2 is read/write accessible by the ARM. It is also accessed by the TC for TPP and bitBLT DMA transfers. 
[0127] CS3, CS4, CSS, and CS6 all have the same characteristics. The ARM performs reads and writes to these 
devices through the Extension Bus. 
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[0128] CS7 is read and write accessible by the ARM. It is also accessed by the TC for TPP DMAs, and it is write 
only. The parallel port is one byte wide and it is accessed via the least significant byte. 

[0129] The Extension Bus supports connection to external EEPROM. SRAM, or ROM memory and DRAM with its 
16-bit data and 25-bit address. It also supports DMA transfers to/from the Extension Bus. DMA transfers within the 
s extension bus are not supported. However, they may be accomplished by DMA to the SRAM, followed by DMA to the 
extension bus. Extension Bus read and write timing are shown in Figure 8 (read) and Figure 9 (write), both with two 
programmable wait states. The number of wait state can be calculated by the following formula: 

#of wail states = 

round_up[((CS_delay4devicejcycle_time)/24)-1] 

For example, the CS^delay on the chip is 20 nsec. A device with 80 nsec read timing will need 4 wait states. 

IS [01 30] There are three interrupt lines and three interrupt acknowledges in the'" AV31 0. These interrupts and interrupts 
from other modules are handled by a centralized interrupt handler. The interrupt mask and priority are managed by 
the firmware. The three extension bus interrupts are connected to three different IRQs. When the interrupt handler on 
the ARM begins servicing one of these IRQs. it should first issue the corresponding EXTACK signal. At the completk>n 
of the IRQ, the ARM should reset the EXTACK signal. 

20 [0131] The EXTWAIT signal is an alternative way for the ARM to communicate with slower devices. It can be used 
together with the programmable wait state, but it has to become active before the programmable wait cycle expires. 
The total amount of wait states should not exceed the maximum allowed from Table 5. If the combined total wait states 
exceeds its maximum, the decoder is not guaranteed to function properly. When a device needs to use the EXTWAIT 
signal, it should set the programmable wait state to at least 2. Since the EXTWAIT signal has the potential to stall the 

25 whole decoding process, the ARM will cap its waiting to 490 nanoseconds. Afterwards, the ARM assumes the device 
that generated the EXTWAIT has failed and will ignore EXTWAIT from then on. Only a software or hardware reset can 
activate the EXTWAIT signal again. The timing diagram of a read with EXTWAIT signal on is shown in the Figure 10. 
[01 32] The Extension Bus supports access to 70ns DRAM with 2 wait states. The DRAM must have a column address 
that is 8-bit, 9'bit. or 10-bit. The DRAM rnust have a data width of 8 or 16 bits. Byte access is allowed even when the 

30 DRAM has a 16 bit data width. The system default DRAM configuration is 943it column address and 16-bit data width. 
The firmware will verify the configuration of DRAM during start up. 

[0133] The ^AV310 includes an Inter Integrated Circuit (I^C) serial bus Interface that can act as either a master 
(default) or slave. Only the 'standard mode' (100 kbit/s) l?C-bus system is implemented; *fast mode' is not supported. 
The interface uses 7-bit addressing. When in slave modC: the address of the 'AV310 is programmed by the API. 
35 [01 34] Timing for this interface matches the standard timing definition of the 1% bus. 

[0135] The ^AV310 includes two general purpose 2-wire UARTs that are memory mapped and fully accessible by 
application programs. The UARTs operate in asynchronous rxvxie only and support baud rates of 1200, 2400, 4800, 
9600. 1 4400, 1 9200 and 28800 kbps. The outputs of the UARTs are digital and require external level shifters for RS232 
compliance. 

40 [0136] The IR, RF, and SIRCSI ports require a square wave input with no false transitions; therefore, the signal must 
be threshokled prk>r to being applied to the pins. The interface will accept an IR, RR or SlRCSl data stream up to a 
frequency of 1 .3 KHz. Although more than one may be active at any given time, only one IR. RF, or SIRCSI input will 
be decoded. Decoding of the IR, RF, and SIRCSI signals will be done by a combination of hardware and software. See 
the Communications Processor Module for further details. ' . 

45 [0137] SIRCSO outputs the SIRCSI or IR input or application-generated SIRCSO codes. 

[0138] The *AV310 provides a dedicated data interface for 1394. To complete the implementation, the 'AV310 re- 
quires an external packetizer, Link layer, and Physical layer devk:es. Figure 11 depicts the connection. 
[01 39] The control/command to the packetizer or the Link layer interface device Is transmitted via the Extension Bus. 
The 1 394 data is transferred via the 1 394 interface which has the following 1 4 signals: 

60 



Table 6. 



1 394 Interface Signals 


Signal 


1/ 


Description 


PDATA (8) 


1/ 


8 bit data . 


PWRITE 


O 


if PWRITE is hiqh (active) the 'AV310 
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TableS. (corittngecQ\. , . 



1 394 Interlace Signals . . >. ; 


Signal 


1/ 


Description' ' ' 


PPACEN 


1/ 


asserted at the beqinninq of a packet.- • 


PREADREQ 




asserted (active hiqh) if the Link ^ 


PREAD(I) 


O 


if PRE AD is hiqh (active) the 'AV310 


CLK40 (1) 


O 


40.5 MHz clock. Wart states can be used 


PERROR 


1/ 


indicates a packet error 



IS 



20 



[0140] In recording mode, the *AV310 will send either encrypted or t;lean packets to the 1394 interface. The packet 
is transferred as it comes in. When recording encrypted data, the TPP will send each byte directly to the 1 394 interface 
and bypass the DES nnodule. In the case of recording decrypted data.vthejJPP. will send the packet payload to the 
DES module, then forward a block of packets to the 1394 interface. .The inteirface sends the btock of packets out byte 
by byte. No processing will be done to the packet during recording, except.setting the. encrypt bit to the proper state. 
In particular, the TPP will not remove CWP from the Auxiliary packet. During playback mode, the packet coming from 
the interface will go directly into the TPP module. Figure 1 2 shows the functional block diagram of the data flow between 
the TPP. DES. and 1394 interface. The packet coming out from TPP can go either, to :the 1394 interface or to the RAM 
through Traffic Controller, or to both places at the- same time. This allows the *AV310 to decode one program while 
recording from 1 to all 32 possible services from a transponder. . ■ , 

[0141] Figure 13 and Figure 14 depict the read and write timing relationships on the 1394 interface. 
25 [0142] During recording, if the DERROR signal from the front end interface goes high in the middle of a packet, it is 
forwarded to the PERROR pin. If DERROR becomes active.in between packets, then a PERROR signal will be gen- 
erated during the transfer of the next packet for at least one. PD ATA cycle. ■■ : . i 

[0143] During playback mode, the external 1394 device can only raise the PERROR signal when the PPACEN is 
active to indicate either error(s) in the current packet. or that there are missing packet(s) prior to the current one. 
PERROR is ignored unless the PPACEN is active. :The PERROR signal should stay high for at least two PCLK cycles. 
There should be at rtiost one PERROR signal per packet.:' . ! 

[01 44] The ' AV31 0 requires a hardware reset on power up:. Reset of the device is initiated by pulling the RESET pin 
low, while the clock is running, for at least 100 ns. The following actions will then occur input data on all ports will be 
ignored; external memory is sized; data pointers are reset;>all modules are initialized and set to a default state: the 
TPP tables are initialized; the audio decoder is set.for~16 bit output with .256 x oversampling; the OSD background 
color is set to blue and video data is selected for both the analog and digital outputs; MacroVision is disabled; and the 
1% port is set to master mode. 

[0145] When the reset sequence is finished,, the .device will begin to accept data. All data input prior to the end of 
the reset sequence will be Ignored. 

[0146] JTAG boundary scan is included in the 'AV310: Five pins (including a test reset) are used to implement the 
IEEE 1149.1 (JTAG) specification. The port includes an 8-bit instruction register used to select the instruction. This 
register is loaded serially via the TDI input. Four instriictk3ns are supported, all others are ignored: Bypass; Extest; 
I ntest and Sample. ...... ^ - 

[0147] Timing for this interface conforms to the IEEE 11 49r1 specification. 

[0148] Features of the ARM/CPU module: runs at .40.5 MHz; Supports byte (8-bit), half-word (16-bit), and word 
(32-bit) data types; reads instructions from on-chip ROM or from the Extension Bus; can switch between ARM (32-bit) 
or Thumb (16-bit) Instruction mode; 32-bit data and 32-bit address lines; 7 processing modes; and two hterrupts, FIQ 
and IRQ. 

[0149] The CPU in the 'AV310 is a 32 bit RISC processor, the ARM7TDM1/ Thumb, whk:h has the capability to 
execute instructions in 16 or 32 bit format at a clock frequency of 40.5 MHz. The regular ARM instructions are exactly 
one word (32 -bit) long, and the data operations are only performed on word quantities. However. LOAD and STORE 
instructions can transfer either byte or word quantities. 

[0150] The Thumb uses the same 32 bit architecture with an 16-bit instruction set. That is, it retains the 32 -bit per- 
formance but reduces the code size with 16-bit instructions. With 1 6-bit instruction, Thumb still gives 70 - 80% of the 
performance of the ARM when running ARM instructions from 32-bit memory. In this document, ARM and Thumb are 
used interchangeably. 

[0151] ARM uses a LOAD and STORE architecture, i.e. all operations are on the registers. ARM has 6 different 
processing modes, with 1 6 32-bit registers visible In user mode. In the Thumb state, there are only 8 registers aveiilable 
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in user mode. Howover. the high registers may be accessed through special instructions. The instruction pipeline is 
three stage, fetch -> decode execute, and mast Instructions only take one cycle to execute. Figure 15 shows the 
data path of ARM processor core. 

[0152] The ARM CPU is responsible for managing all the hardware and software resources in the 'AV310. At power 
up the ARM will verify the size of extemal. memory. Following that, it will initialize alt the hardware modules by setting 
up control registers, tables, and reset data pointers. It then executes the default firmware from intemal ROM. A set of 
run-time library routines provides the access to the firmware and hardware for user application prograrns. The appli- 
cation programs are stored in extemal memory attached to the Extension Bus. 

[01 53] During normal operation the ARM constantly responds, based on a programmable priority, to interrupt requests 
from any of the hardware rnodules and devices on the Extensbn Bus. The kind of interrupt services include transport 
packet parsing, program clock recovery, traffic controller and OSD service requests, service or data transfer requests 
from the Extension Bus and Communication Processor, and service requests from the AudloA/ideo decoder. 
[0154] Features of the Traffic Controller Module: manages Interrupt requests; authorizes and manages DMA trans- 
fers; provides SDRAM interface; manages Extension Bus; provides memory access protection; manages the data flow 
between processors and memories: TPP/DES to /from intemal Data RAM; Data RAM to/from Extension Bus; SDRAM 
to OSD; OSD to/from Data RAM; Audto/ Video Decoder to/from SDRAM; and SDRAM to/from Data RAM. Generates 
chip selects (CS) for all intemal modules and devices on the Extension Bus; generates programmable wait states for 
devices on the Extension Bus; and provides 3 breakpoint registers and 64 32-bit patch RAM. 
[0155] Figure 16 depicts the data flow managed by the Traffic Controller. 

[0156] The SDRAM interface supports 12 nanoseconds 16-bit data width SDRAM. It has two chip selects that altow 
connections to a maximum of two SDRAM chips! The minimurh SDRAM size required by the decoder is 1 6 Mbit. Other 
supported sizes and configurations are: 

16 Mbit -> one 16 Mbit SDRAM 



20 Mbit one 16 Mbit and one 4 Mbit SDRAM 



24 Mbit one 16 Mbit and one 8 Mbit SDRAM 



32 Mbit -4 two 16 Mbit SDRAM 

The access to the SDRAM can be by byte, half word, single word, continuous block, video line block, or 2D macroblock. 
The interface also supports decrement mode for bitBLT block transfer. 
[01 57] The two chip selects correspond to the following address ranges: 

SCSI OxFEOO 0000 - OxFEIF FFFF. 



SCS2 -» 0xFE20 0000 - 0xFE3F FFFF 
During decoding, the ^AV310 allocates the 16 Mbit SDRAM for NTSC mode according to Table 7. 



Table 7. 



Memory Allocatran of 16 Mbit SDRAM (NTSC) 


Starting Byte Address 


Ending Byte Address 


Usage 


0x000000 


0X0003FF 


Pointers 


0x000400 


OxOOOFFF 


Tables and FIFOs 


0x001000 


OX009FFF 


Video Microcode (36,864 bytes) 
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Table?, (continued) . • 



Memory Allocation of 16 Mbit SDRAM (NTSC) 


Starting Byte Address 


Ending Byte Address 


Usage 


OxOOAOOO 


0X0628FF 


Video. B.utfer (2,902,008 bits) 


0x062900 


UXUb4opr 


AtiHirk RifffAr /R>^ CZOC hrte\ 


0x064900 


0X0E31FF 


First .Reference Fiame (518.400 
bytes) > 


OX0E32O0 


0x161 CFF 


Second'Reference Frame (518.400 
bytes) 


0x161000 


0X1C9DFF 


B Frame (426.240 bytes. 0.82 
f name's) 


0X1C9E00 


OxlFFFFF 


OSD or other use (222.210 bytes)* 



* These values are for Ihe current DSS specification. In the latest proposed specHtcation. the VBV buffer size is reduced to 1 .835.008 bits, spving 
355^586 bytes for OSD or other use. 



20 [0158] However, it is also within the scope of the present invention to put the VBV buffer in optional memory oh the 
extension bus 300 and thereby free up the SDRAM memory by the amount of the VBV buffer This means that the 
SDRAM is allocated in a different manner than that of Table 7; that is the OSD memory size may be expanded or any 
of the other blocks expanded. 

[01 59] Interrupt requests are generated from intemal modules like the TPP. OSD, AA/ decoder and Communication 
25 Processor, and devices on the Extension Bus. Some of the requests are for data transfers to intemal RAM, while others 
are true interrupts to the ARM CPU. The Traffic Controller handles data transfers, and the ARM provides services to 
true interrupts. The intermpts are grouped into FIQ and IRQ. The system software will use FIQ. while the application 
software will use IRQ. The priorities for FIQs and IRQs are managed by the firmware. 

[0160] The SDRAM is used to store system level tables, video and audio bitstreams. reconstructed video images. 

30 OSD data, and video decoding codes, tables, and Fl FOs. The internal Data RAM stores temporary buffers, OSD window 
attributes; keys for conditional access, and other tables and buffers for firmware. The TC manages two physical DMA 
channels, but only one of them, the General Purpose DMA. is vistole to the user. The user has no knowledge of the 
DMAs initiated by the TPP, the video and audio decoder, and the OSD module. The General Purpose DMA includes 
ARM-generated and bitBLT-generated DMAs. The TC can accept up to 4 general DMAs at any given time. Table 8 

3S describes the allowable General Purpose DMA transfers. 



Table 8. 



DMA Sources and Destinations 


DMA Transfer 




SDRAM 


Data RAM 


Extension Bus 


SDRAM 


NO 


YES 


NO 


Data RAM 


YES 


NO 


YES 


Extension Bus 


NO 


YES 


NO . 


Note that there is no direct DMA transfer to/from the Extenston Bus memories f romAo the SDRAM. However, 
the user can use the bitBLT hardware which uses Data RAM as intermediate step for this purpose. The only constraint 
IS the block being transferred has to start at a 32 -bit word boundary. 



[0161] Features of the TPP Module: parses transport bitstreams; accepts bitslream either from the front end device 
or from the 1394 interface; performs System Clock Reference (SCR) recovery; supports transport stream up to 40 
Mbtts-per-second; accepts B-bit parallel input data; supports storage of 32 SCID; lost-packet detection; provides de- 
crypted or encrypted packets directly to the 1 394 interface; and internal descrambler for DSS with the Data Encryption 
Standard (DES) implemented in hardware. 

[0162] The TPP accepts packets byte by byte. Each packet contains a unique ID, SCID, and the TPP extracts those 
packets containing the designated ID numbers. It processes the headers of transport packets and transfers the payload 
or auxiliary packets to the intemal RAM via the DES hardware and Traffic Controller. Special firmware running on the 
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ARM handles DES key extraction and activates DES operation. The ARM/CPU performs further parsing on auxiliary 
packets stored in the internal RAM. The ARM and TPP together also perform SCR clock recovery. Figure 17 is an 
example circuit for the extemal VCXO. The output from the 'AV310 is a digital pulse with 256 levels. - 
[0163] The Conditional Access and DES block is part of the packet header parsing function. A CF bit in the header 
indicates whether the packet is clean or has been encrypted. The clean packet can be forwarded to the internal RAM 
directly, while the encrypted one needs to go through the DES block for decryption. The authorization and decryption 
key information are transmitted via Control Word Packet (CWP). An extemal Smart Card guards this infonnation and 

provides the proper key for the DES to work. 

[0164] The 1394 Interface is directly connected to the TPP/DES module. At the command of the user program, the 
TPP/DES can send either clean or encrypted packets to the 1394 interface. The user can select up to 32 servrces to 
record. If the material is encrypted, the user also needs to specify whether to record clean or encrypted video. In 
recording mode, the TPP will appropriately modify the packet header if decrypted mode is selected; in encrypted mode, 
the packet headers will not be modified. During the playback mode, the 1 394 interface forwards each byte as it comes 
in to the TPP. The TPP parses the bitstrearn the same way it does data from the front end. 

[0165] Features of Video Decoder Module: Real-time video decoding of MPEG-2 Main Profile Main level and MPEG- 
1; error detection and concealment; internal 90 KHz/27 MHz System Time Clock; sustained Input rate of 16 Mbps; 
supports Trick Mode with full trick mode picture; provides 1/4 and 1 /1 6 decimated size picture; extracts Closed Caption 
and other picture user data from the bitstrearn; 3:2 pulldown in NTSC mode; and supports the following display format 
with polyphase horizontal resampling and vertical chrominance filtering 



Table 9. 



Supported Video Resolutions 


NTSC (30 Hz) 


PAL (25 HZ) 


Source 


Display 


: Source 


Display 


720 x 480 


720 x 480 


720 X 576 


720 x 576 


704X480 


720 X 480 


704X576 


720x576 


544 x 480 


720x480 


544 X 576 


720 X 576 


480 x 480 


720 X 480 


480x576 


720 X 576 


352 X 480 


720 X 480 


352x576 


720 X 576 


352 X 240 


720 X 480 


352 x 288 


720 X 576 


Pan-and-scan for 1 6:9 source material according to both DSS and MPEG syntax; high level command interface; 


and synchronization using Presentation Time Stamps (PTS). 





[0166] The Video Decoder module receives a video bitstrearn from SDRAM. It also uses SDRAM as its working 
memory to store tables, buffers, and reconstructed images. The decoding process Is controlled by a RISC engine which 
accepts high level commands from the ARM, In that fashion, the ARM is acting as ah extemal host to initialize and 
control Video Decoder module. The output video is sent to the OSD rnodule for further blending with OSD data. 
[0167] Besides normal bitstream decoding, the Video decoder also extracts from the picture layer user data the 
Closed Caption (CC). the Extended Data Sen^ices (EDS), the Presentation Time Stamps (PTS) and Decode Time 
Stamps, the pan_and_scan, the fields display flags, and the no.burst flag. These data fields are specified by the DSS. 
The CC and EDS are forwarded to the NTSC/PAL encoder niddule and the PTS is used for presentation synchroniza- 
tton. The other data fields form a DSS-specific.oonstraints on the normal MPEG bitstream, and they are used to update 
Information obtained from the bitstream. 

[0168] When the PTS and SCR (System Clock Reference) do not match within tolerance, the Video decoder will 
either redisplay or skip a frame. At that time, the CC/EDS will be handled as follows: if redisplaying a frame, the second 
display will not contain CC/EDS; if skipping a frame, the corresponding CC/EDS will also be skipped. During trick mode 
decoding, the video decoder repeats the following steps: searches for a sequence header followed by an I picture; 
ignores the video buffer underflow error; emd continuously displays the decoded I frame. 
[0169] Note that trick mode I frame data has to contain the whole frame instead of only several intra slices. 
[0170] The Video decoder accepts the high level commands detailed in Table 10. 



Table 10 



Video Decoder Commands 


Play 


normal decoding 
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Table 10 (continued) • • r - • • 



Video Decoder Commands ".c • .. ;r • • : . 


Freeze 


normal decoding but continue to display the last picture * 


Slop 


stops the decoding process. The display continue vyith the last picture. 


Scan ' ' ~ 


searches for the first I picture, decodes it. continuously displays it. and'flushes the buffer 


NewChannet 


for channel change. This command should be p.receded.by a Stop. ODmma^ 


Reset 


halts execution of the current command. The bitstream buffer is flushed and the video decoder 
performs an internal reset • • n-. 


Decimate 1/2 


continue normal decoding and displaying of a 1/2 x 1/2.decimiated picture (used by OSD API) 


Decimate 1/4 


continue normal decoding and displaying of a 1/4 x 1/4 decimated picture (used by OSD API) 



[0171] The following table shows the supported aspect ratio conversloii^ ' . . : ' 



Table 11. 



/Aspect Ratio Conversions 




' Display 


Source 


4:3 


16:9 


4:3 


YES 


NO 


16:9 


PAN-SCAN 


YES 



. [0172] The Pan-Scan method is applied when displaying a 16:9 source video on a 4:3 device. The Pan-Scan location 
specifies to the 1 , 1/2, or 1/4 sample if the source video has the full size, 720/704 x 480. If the sample size is smaller 
than full then the Pan-Scan location only specifies to the exact integer sample. Note that the default display format 
output from 'AV310 is 4:3. Outputting 16:9 video is only available when the image size is 720/704 x 480. A reset is also 
required when switching between a 4:3 display device and a 16:9 one. 

[0173] The 1/2 and 1/4. decirhation, in each dimension, is supported for various size .images in 4:3 or 16:9 format. 
The following table provides the details. 

35 Table 12. 



Decimation Modes 


Sample Size 


Source 


4:'3''' 


16:9 




Full.: 


1/2 . 


1/4 - 


Full 


1/2 


1/4 


720/704 x 480 


YES • 


YES 


YES- 


YES 


YES 


YES 


544 x 480 


YES 


YES 


YES 


YES 


YES 


YES 


480x480 


YES 


YES 


YES 


YES 


YES 


YES 


352 X 480 


YES 


YES 


YES 


YES 


YES 


YES 


352 X 240 


YES 


YES 


YES 


NO 


NO 


NO 



50 Features of the audio decoder module: decodes MPEG audio layers 1 and 2; supports all MPEG-1 and MPEG-2 data 
rates and sampling frequencies, except half frequency; provides automatic audio synchronization; supports 16- and 
18-bit PCM data; outputs in both PCM and S PDIF formats; generates the PCM clock or accepts an' external source; 
provides error concealment (by muting) for synchronization or bit errors; and provides frame-by-frame status infonma- 
tioh. 

55 [01 74] The audio module receives MPEG compressed audio data from the traffic controller, decodes it, and outputs 
audio samples in PCM format. The ARM CPU initializes/controls the audio decoder via a control register and can read 
status information from the decoder's status register. 
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[0175] Audio frame data and PTS information is stored in the SDRAM in packet form. The audio module will decode 
the packet to extract the PTS and audio data. 

[0176] The ARM can control the operation of the audio module via a 32-blt control register. The ARM may reset or 
mute the audio decoder, select the output precision and oversampling ratio, and choose the output format for dual 
channel mode. The ARM will also be able to read status information from the audio module. One (32-bit) register 
provides the MPEG header information and sync, CRC, and PCM status. 

[01 77] The audio module has two registers: a readAyrite control register and a read-only status register. The registers 
are defined below. 



10 






Table 13. 




Audio Module Registers 




Register # 


Location 


Description 


IS 


0 


31:6 


Reserved (set to 0) 








oeieci 

00 = 16 bit. no oversampling 

01 = 16 bit. 256 x oversampling 


20 






10 ~ 18 bits no ovat^fimnlinft 

11 = 18 bits. 384 x oversampling 






3:2 


Dual Channel Mode Output Mode Select 


25 






. 00 = Ch 0 on left, Ch 1 on right 
01 = Ch 0 on both left and right 
10 = Ch 1 on both left and right 
11= Reserved 


30 




1 


Mute 

0 = . Normal operatbn 

1 = Mute audio output 


35 




0 


Reset 

0 = Normal operation 

1 = Reset audio module 



40 
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Table 1 3. (continued) , 





Audio Module Registers 




Register # 


Location 


Description 


5 


1 

(otatus riegtsier * ri oniy; 


31 


Stereo Mode 

0 = all other 

1 = dual nruxJe 


10 




30:29 


Sampling Frequency 

00 = 44.1 KHz 

01 =48 KHz 

10 = 32 KHz 


IS 






1 1 = Resen/ed 






28:27 


De-emphasis Mode 
00 = None 


20 






01 = 50/1 5 microseconds 
10 = Resented 
11=CCITTJ.17 


25 




26 


Synchronization Mode 






0 = Normal operation 

1 = Sync recovery mode 






25 


CRC Error 


30 






0 = No CRC error or CRC not enabled in bkstream 

1 = CRC error found 






24 


PCM Underflow 


35 






0 = Normal operation 






1 = PCM output underflowed 






23:4 


. Bits 1 9-0 ol the MPEG header 






3:0 


Version number of the audio decoder 



40 

Features of the OSD module: supports up to 8 hardware windows, one of which can be used for a cursor; all the non- 
overlapped windows can be displayed simultaneously; overlapped windows are displayed obstructively with the highest 
priority window on top; provides a hardware window-based rectangle cursor with programmable size and blinking 
frequency; and provides a programmable background color, which defaults to blue; supports 4 window formats (empty 

45 window for decimated video; bitmap; YCrCb 4:4:4 graphics component; and YCrCb 4:2:2 CCIR 601 component); sup- 
ports blending of bitmap, YCrCb 4:4:4, or YCrCb 4:2:2 with motion video and with an empty window; supports window 
mode and color mode blending; provides a programmable 256 entries Color Look Up table; outputs motion video or 
mixture with OSD in a programmable 422 or 444 digital component format; provides motion video or mixture with OSD 
to the on-chip NTSC/PAL encoder and provides graphics acceleration capability with bitBLT hardware Each hardware 

so window has the following attributes: window position (any even pixel horizontal position on screen; windows with dec- 
imated video have to start from an even numbered video line also); window size: from 2 to 720 pixel wide (even values 
oniy) and 1 to 576 lines; window base address; data format (bitmap, YCrCb 4:4:4. YCrCb 4:2:2, and empty); bitmap 
resolution (1 ; 2. 4, and 8 bits per pixel); full or half resolution for bitmap and YCrCb 4:4:4 windows; bitmap cofor palette 
base address; blend enable flag; 4 or 16 levels of blending; transparency enable flag for YCrCb 4:4:4 and YCrCb 4:2: 

55 2; and output channel control. 

[0178] The OSD module is responsible for managing OSD data from different OSD windows and blending them with 
the video. It accepts video from the Video Decoder, reads OSD data from SDRAM, and produces one set of video 
output to the on-chip NTSC/PAL Encoder and another set to the digital output that goes off the chip. "The OSD module 
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defaults to standby mode, in which it simply sends video from the Video Decoder to both outputs. After being activated 
by the ARM CPU. the OSD module, following the window attributes set up by the ARM, reads OSD data and mixes it 
with the video output. The ARM CPU is responsible for tuming on and off OSD operations. The bitBLT hardware which 
is attached to the OSD module provides acceleration to memory block moves and graphics operations. Figure 18 
5 shows the block diagram of the OSD module. The vark>us functions of the OSD are described in the following subsec- 
tions. 

[0179] The OSD data has variable size. In the bitmap mode, each pixel can be 1 , 2. 4. or 6 bits wide. In the graphics 
YCrCb 4:4:4 or CCIR 601 YCrCb 4:2:2 modes, it takes 8-brt per components, and the components are arranged ac- 
cording to 4:4:4 (CbWCr/Cb/Y/Cr) or 4:2:2 (Cb/Y/Cr/Y) format. In the case where RGB graphics data needs to be 

10 used as OSD, the application should perfomn software conversbn to Y/Cr/Cb before storing it. The OSD data is always 
packed into 32-bit words and left justified. Starling from the upper left comer of the OSD wiridow, all data will be packed 
into adjacent 32-bit words. The dedicated bitBLT hardware expedites the packing and unpacking of OSD data for the 
ARM to access individual pixels, and the OSD module has an internal shifter that provides pixel access. 
[0180] In NTSC mode, the available SDRAM is able to store one of the following OSD windows with the size listed 

IS in Table 14, with the current and proposed VBV buffer size for DBS. 



Table 14.' 



20 



25 



SDRAM OSD Window Size 




720 X 480 frames 


bits/pixel 


Current 


Proposed 


24 


0.21 . 


0.34 


8 


0.64 * 


■ 1.03 


4 


1,29 


2.06 


2. 


2.58 


4.12 



[0181] An OSD window is defined by its attributes. Besides storing OSD data for a window into SDRAM, the appli- 
cation program also needs to update window attributes and other setup in the OSD module as described in the following 
^ subsections. 

[0182] The CAM memory contains X and Y locations of the upper left and lower right corners of each window The 
application program needs to set up the CAM and. enable selected OSD windows. The priority of each window is 
determined by its location in the CAM. That is. the lower address window always has higher priority. In order to swap 
the priority of windows, the ARM has to excfiange the locations within the CAM. 

3S [01 83] The OSD module keeps a local copy of window attributes. These attributes aibw the OSD module to calculate 
the address for the OSD data, extract pixels of the proper size, control the blending factor, and select the output channel. 
[01 84] Before using bitmap OSD the application proigram has to initialize the 256 entry color look up table (CLUT). . 
The CLUT is mainly used to convert bitmap data into Y/Cr/Cb components. Since bitmap pixels can have either 1, 2, 
4, or 8 bits, the whole CLUT can also be programmed to contain segments of smaller size tables, such as sixteen 

^ separate, 16-entryCLUTs. 

[0185] There are two blending modes. The window mode blending applies to OSD window of type bitmap. YCrCb 
4:4:4, and YCrCb 4:2:2. The cok^r rnode, pixel by pixel, blending is only allowed for the bitmap OSD. Blending always 
blends OSD windows with real time mot ion. video. That is. there is no blending among OSD windows except the empty 
window that contains decimated motion video. In case of overlapping OSD windows the blending only occurs between 

^ the top OSD window and the video. The blending is controlled by the wiridow attributes, Blend_En (2 -bit), Blend Level 
(4-bit), and Trans_En (l-bit). Blend_En activates blending as showri in Table 15. In window mode ail pixels are mixed 
with the vkJeo data based on the level defined by the attributes Blend Level. In the color mode the blending level is 
provided in the CLUT That is, the least significant bit of Cb and Cr provides the 4 level blending, while the last two bits 
f rbrh Cb and Cr provide the 16 level blending. Transparency level, no OSD but only video, is achieved with the Trans.En 

^ bK on and the OSD pixel containing all Os. 



Table 15. 



ss 



OSD Blending Control 


Blend_En 


Blending modes 


00 


Disable Blending 


01 


4 Level Color Blending 
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10 



IS 



20 



2S 



30 



35 



40 



45 



50 



55 



OSp Blending Control 


Blend_En 


Blending modes 


10 


1 6 Level Color Blending 


11 


VyTmdow Mode Blending 



[0186] A rectangular blinking cursor is provided using hardware wihdoWb. With window 0, the cursor always appears 
on top of other OSD Windows. The user can specify the size of the cur^r via window attribute. The activation of the 
cursor, its color, and blinking frequency are programmable via control registers. When hardware window 0 is designated 
as the cursor, only seven windows are available for the application. If a hardware cursor is not used, then the application 
can use window 0 as a regular hardware window. 

[0187] After the OSD windows are activated, each of them has an attrbute, Disp_Ch_Cntl[1.0]. that defines the 
contents of the two output channels (the analog and digital video output^) when the position of that window is currently 
being displayed. The following table shows how to control output channels. 

Table 16. 



OSD Module Output Channel Control 


bisp_Ch_cntl[1] 


Disp_Ch_cnt![OJ 


Channel 1 Digital Video Output 


Channel 0 To NTSC/PAL Encoder 


0 


. 0 


MPEG Video 


MPEG Video 


0 


1 


MPEG Video 


Mixed OSD_Window 


1 


0 


Mixed OSD^Window 


MPEG Video 


1 


1 


Mixed OSD_Window 


Mixed OSD_Window 



Example displays of these two output channels are shown in Figure 19. 

[01 88] The bitBLT hardware provides a faster way to move a block of memory from one space to the other It reads 
data from a source location, performs shift/mask/merge/expand operations on the data, and finally writes it to a des- 
tination location. This hardware enables the following graphics functions: Set/Get Pixel; Horizontal/Vertical Line Draw- 
ing; Block Fill; Font BitBLTing; Bitmap/graphic BitBLTing;:and Transparency 
[0189] The allowable source and destination memories for bitBLT are defined in Table 17. ' 

•• Table 17. 



Source and Destination Memories for BitBLT 


Source Memory 


Destination Memory 


SDRAM 


Ext_Bus Memory 


SDRAM 


YES 


YES 


Ext^Bus Memory . 


' .YES- 


" YES 



[0190] The types of source and destination OSD windows supported by the bitBLT are the given in the following 
table (the HR stands for half resolution). 

Table 18. 



Allowable BitBLT Window Formats 


Source OSD Window 


Destination OSD Window 


YCrCb 4:4:4 


YCrCb 4:4: 4_H R 


YCrC b 4:2: 2 


Bitmap 


Bitmap. HR 


YCrCb 4:4:4 


YES 


YES 


NO 


NO 


NO 


YCrCb 4:4:4_HR 


YES 


YES 


NO 


NO 


NO 


YCrCb 4:2:2 


NO 


NO 


YES 


NO 


NO 
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Table 1 8. (continued) 



5 



Allowable BitBLT* Window Foirnats 


Source OSD Window 


Destination OSD Window 


YCrCb 4:4:4 


YCrCb 4:4:4_H R. 


YCrC b 4:2: 2 


Bitma p 


Bitmap_ HR 


Bitmap 


YES 


YES 


NO 


YES 


YES 


Bftmap_HR 


YES 


YES 


NO 


YES 


YES 



10 

[0191] Since the bitmap allows resolutions of 1, 2, 4. or 8 bits per pixel, the bitSLT will drop the MSB bits or pad It 
with Os when swapping between windows of different resolution. For half-resolution OSD, the horizontal pixel dimension 
must be even numbers. For YCrCb 4:2:2 data, the drawing c^eration is always on 32-bit words, two adjacent pixels 
that align with the word boundary. 
IS [0192] In a block move operation, the block of data may also be transparent to allow text or graphic overlay The 
pixels of the source data will be combined with the pixels of the destination data. When transparency is turned on and 
the value of the source pixel is non-zero, the pixel will be written to the destination. When the value of the pixel is zero, 
the destination pixel will remain unchanged. Transparency is only allowed from bitmap to bitmap, and from bitmap to 
YCrCb 4:4:4. 

20 [0193] Features of NTSC/PAL Encoder module: supports NTSG and PAL 8. D, G/H. and I display formats; outputs 
Y. C, and Composite video with 9-bit DACs; compiles to the RS1 70A standard; supports MacroVision Anti-taping f unc- 
t»n; provides Closed Caption, Extended Data Services, and aspect ratk> VARIS -encoding; and provides sync signals 
with option to accept external sync signals. 

[0194] This module accepts from the OSD module the video data that may have been blended with OSD data and 
25 converts it to Y, C. and Composite analog outputs. The Closed Caption and Extended Data Services data are provided 
by the Video decoder through a serial interlace line. These data are latched into corresponding registers. The CC 
encoder sends out Ckssed Caption data at video line 21 and Extended Data Services at video line 284. The ARM 
initializes and controls this module via the ARM Interface block. It also sends VARIS code to the designated registers 
which is then being encoded into video line 20. The ARM also tums on and off MacroVision through the ARM Interface 
30 block. The default state of MacroVision is off, 

[0195] Features of the Communications Processor module; provides two programmable timers; provides 3 UARTs 
- one tor Smart Card and two for general use; accepts IR. SIRCSI and RF signals; provides a SIRCSO output; provides 
two general purpose l/Os; and manages 1% and JTAG interfaces. 

[0196] This module contains a collection of buffers, control registers, and control logic for various interfaces, such 
35 as UARTs. IR/RF. 1^. and JTAG. All the buffers and registers are memory mapped and individually managed by the 
ARM CPU. Interrupts are used to communicate between these interface modules and the ARM CPU. 
[0197] The ' AV31 0 has two general purpose timers which are user programmable. Both timers contain 1 6 bit counters 
with 16 bit pre-scalers, allowing for timing intervals of 25 ns to 106 seconds. Each timer, timerO and timerl, has an 
associated set of control and status registers. These registers are defined in Table 1 9. 

40 

Table 19. 



Timer Control and Status Registers 


Register 


Read/Wr 


Description 


Name 
Tcrx 


rte 
RM 


Timer Control Register 



so 



ss 
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Table 19, (continued) 



25 



Timer Control and Status Registers 


Register 


Read/Wr 


Description 


- 




31 


- Reserved (set to 0) 




— 


6 


tint_mask 






5 


0 = enable interrupts 








1 = mask interrupts 








resented (set to 1) 






4 


reserved 






3 


soft - soft stop: 






2 


0 = reload counters on 0 . * 








1 = stop timer on 0 






1 


tss • timer stop: 








0 = start 








.1 = stop 






0 


tip ' timer reload 








0 = do not reload 








1 = reload the timer 








(readO) 


Tddrx 


W 


Timer Divide.Down (15-0). Contains the value for.the pre-scalar to preload psc during pre- 






scalar rollover (Note: reading this register is equivalent to reading the prld register.) 


Prdx 


w 


Timer Period Register (15-0). Contains the value for tim to preload during tlln rollover. (Note: 






reading this register.ls equivalent to reading the tim32 register.) 


Preidx 


R 


Preload Value. 






31 


-Value of prd 






16 


Value of tddr 






16 








0 




tim32x 


R 


Actual time Value (31-0) 






31 


- Value of tim 






16 


Value of psc 






16 








0 




Note: X designates the timer number, 0 or 1 . 



[0198] The timers are countKiown timers composed of 2 counters: the timer pre-scaler, psc, which is pre-loaded 
from tddr and counts down every sys_ckx:k; and the timer counter, tim. (pre-loaded from prd). When psc = 0, it pre- 
so loads itself and decrements tim by one. This divides the sys_clock by the following values: 
(tddr + 1 ) * (prd + 1 ), if tddr and prd are not both 0. or 
2. if tddr and prd are both 0. 

[01 99] When tim = 0 and psc = 0, the timer will issue an interrupt if the corresponding tlnt_mask is not set. Then both 
counters are pre-loaded if soft = d. If soft is 1 . the timer stops counting. 
ss [0200] The timer control register (tor) can override normal timer operations. The timer retoad bit. trb, causes both 
counters to pre-load, while the timer slop bit, tss. causes both counters to stop, 

[0201] The two general purpose 2-wire UARTs are asynchronous mode, full duplex, double buffered with 8 bytes 
FIFO UARTs that operate at up to 28.8 kbps. They transmit/receive 1 start bit. 7 or 8 data bits, optional parity, and 1 
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or 2 stop bits. 

[0202] The UARTs are fully accessible to the API and can generate intermpts when data is received or the transmit 
buffer is empty. The ARM also has access to a status register for each UAFTT that contains flags for such errors as 
data overrun and framing errors. 
5 [0203] The IR/RF remote coritrbl interface is a means of transmitting user commands to the set top box. This interface 
consists of a custom hardware receiver implementing a bit frame-based communication protocol. A single bit frame 
represents a user command. 

[0204] The bit frame ^ defined in three possible lengths of 1 2, 1 5 or 20 bits. The on/off values of the bits in the frame 
are represented by two different length pulse widths. A 'one' is represented by a pulse width of 1 .2 ms and a 'zero' is 

10 represented by a 0.6 ms pulse width. The example in Figure 20 shows the IR input bitstream. The bitstream is assumed 
to be free of any carrier (36-48 KHz typical) and represents a purely digital bitstream in return-to-zero format. The 
hardware portion of this interface is responsible for determining the bit value along with capturing the bit stream and 
placing the captured value into a read register for the software interface to access. Each value placed in the read 
register will generate an interrupt request. 

IS [0206] Each user command is transmitted as a single bit frame and each frame is transmitted a minimum of three 
times. The hardware interface is responsible for recognizing frames and filtering out unwanted frames. For a bit frame 
to be recognized by the hardware interface it must pass the following steps: first it must match the expected frame 
size, 12. 15 or 20 bits; then two of the minimum three frames received must match in value. A frame match when 
detected by the hardware interface will generate only one interrupt request. 

20 [0206] The IR/RF protocol has one receive interrupt, but it is generated to indicate two different conditions. The two 
different conditions are start and finish of a user command. The first type of receive interrupt (start) is generated when 
the hardware interface detects a new frame (remember 2 out of three frames must match). The second type of interrupt 
is generated when there has been no signal detected for the length of a hardware time out period (user command time 
out). Each frame, when transmitted, is considered to be continuous or repeated. So although there Is a three frame 

2S minimum for a user command the protocol is that when a start interrupt is received the interface will assume that until 
a finish (time out) interrupt is generated the same frame is being received. * 

[0207] A typical example of the receive sequence is to assume that the interface has been dormant and the hardware 
interface detects a signal that is recognized as a frame. This is considered the start of a user command, and a start 
interrupt is issued by the hardware interface. The finish of a user command is considered to be when there has not 

30 been a signal detected by the hardware interface for a time out period of approximately 100 ms. The finish wilt be 
. indicated by an interrupt from the hardware interface. 
[0208] During a receive sequence it Is possible to receive several start interrupts before receiving a finish interrupt. 
Several start interrupts maybe caused by the user entering several commands before the time out period has expired. 
Each of these commands entered by the user would be a different command. A new user command can be accepted 

3S before the previous command time out. 

[0209] The IR, SIRCSl. and RF inputs share common decoding logic. Figure 21 shows a theoretical model of the 
hardware interface. There are three possible inputs, SIRCSl, IR and RF. and one output, SIRCSO. The IR receiver 
receives its input from the remote control transmitter while the SIRCSl receives its input from another device's SIRCSO. 
Again, examining Figure 21 shows that normal operation will have the IR connected to the SIRCSO and the decoder. 

40 The SlRCSt signal tias priority over the IR and will override any iR signal in progress. If a SIRCSl signal is detected, 
the hardware interface will switch the input stream from IR to SIRCSl and the SIRCSl will be routed to the decoder 
and the SIRCSO. 

[0210] There are two possible inputs for the IR frame type and one input for the RF frame type. A selection must be 
made by the user if the received frame type is going to be IR or RF. The IR/RF interface contains two 32-bit data 
4S registers, one for received data (IRRF Data Decode register) and one for data to be written out (IRRF Encode Data 
register). In both registers, bits 31-20 are not used and are set to 0. 

[0211] The 'AV310 has two general purpose I/O pins (I01 and I02) which are user configurable. Each I/O port has 
its own 32-bit control/status register, iocsri or iocsr2. 

[0212] If an I/O is configured as an input and the delta intermpt mask is cleared, an ARM interrupt is generated 
so whenever an input changes state. If the delta interrupt mask is set, interrupts to the ARM are disabled. If no other 

device drives the I/O pin while it is configured as an input, it will be held high by an intemal pull-up resistor. 

[0213] If an I/O is configured as an output (by setting the cio bit in the corresponding control/status register), the 

value contained in the io_out bit of the control/status register is output. Interrupt generation is disabled when an I/O is 

configured as an output. 
ss ' [0214] The definition of the control/status registers is given in Table 20. 
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Table 20. 



10 



1$ 



20 



25 



30 



I/O Control/Status Registers 


Bit Number 


Name 


Description 


31-4 


Reserved 


Set to 0 (read only) 


3 


iojn 


input sample value (read only) 


2 


dim 


delta interrupt mask: . * ■ 






0 = generate interrupts*' - 






1 = mask interrupts ' " 


1 


CIO 


configure i/o: 






0 = input * * ; 7 






1 = output . . . . , 


0 


logout 


output value if .cio is 1 . . , . . 



[0215] The ^AV310 includes an |2c serial bus interface that can act as either amaster or slave. (Master mode is the 
default). In master mode, the 'AV310 initiates and terminates transfers and generates clock signals. 
[0216] To put the device in slave mode, the ARM must write to a control regisjer in the block. The API must set the 
slave mode select and a 7-bit. address for the 'AVSI 0. It must also send a software reset to the I2C to complete the 
transition to slave mode. 

[0217] In slave mode, when the programmable address bits match the applied address, the *A\/310 will respond 
accordingly. The *AV310 will also respond to general call commands issued to adcjress 0 (the general call address) 
that change the programmable part of the slave address. These commands arj9 0x04 and 0x06. No other general call 
commands will be acknowledged, and no action will be taken. 

[0218] The circuitry is presently preferably packaged in a 240 pin RQFR, Table 21 is a list of pin signal names and 
their descriptions. Other pin outs may be employed to simplify the design of emulation, simulation. andApr software 
debugging platforms employing this circuitry. 

Table 21. ' . 





Signal Name 


# 


1/ 








0 . 




Transport Parser 






35 


DATAIN[7:0]* 


B 


1 




DCLK* 




1 . - 




PACCLK* 




1. 




BYTE.STRT* 




X "■ ^. . ' 


40 . 


DERROR* 




: 1. 




CLK27* 




: 1: - ' 




VCXO_CTRL* 




;o : 




CLK^SEL 




1 ' 


45 










Communication s Processor 








IR* 




1 




RF* 






60 


SIRCSr 








SIRCSO* 




0 




UARTDir 




1 




UARTDOr 




0 




UARTDI2* 






SS 


UARTD62* 




0 



Description 



Data Input. Bit 7 is the first bit in the transport stream 
Data Clock. The maximum f requency is 7.5 MHz. 
Packet Clock. Indicates valid packet data on DATAIN. 
Byte Start. Indicates the first byte of a transport packet 
for DVB, Tied low.for DSS. 

Data Error, active high. Indicates an error in the input 
data. Tie low If not used. 
,27 MHz Clock input from an external VCXO. 
VCXO Control. Digital pulse output for external VCXO. 
Clock select CLK_SEL low selects a 27 MHz input 
clock. When high, selects an 81 MHz input clock. 

Infra-Red sensor input- 
RF sensor input 
SI RCS control input 
SI RCS control output 
UART Data Input, port 1 
UART Data Output, port 1 
UART Data Input, port 2 

UART Data Output, port 2 



* inciieates a 5 volt tolerant pin 
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Table 21 . (continued) 



PDATA 


8 


1/ 
O 


1394 Interface Data Bus 


PWRITE 


1 


O 


1394 Interface Write Signal 


PREAD 


1 


o 


1 394 Interface Read Signal 


PPACEN 


1 


It 

1/ o 


1 394 Interface Packet Data Enable 


PREADREQ 


1 


1 


1 394 Interface Read Data Request 


PERROR 


1 


I/O 


1 394 Interface Error Flag 


IIC_SDA* 


1 


I/O 


I^C Interface Serial Data 


IIC_SCL* 


1 


i7o 


|2C Interface Serial Clock 


lor 


1 


I/O 


General Purpose I/O 


I02* 


1 


I/O 


General Purpose I/O 


Extension Bus EXTR/W 


1 


o 


Extension Bus Read/Write. Selects read when high, 








write when k>w. 


EXTWAIT 


1 


1 


Extension Bus Wait Request, active low, open drain 


EXTADDR[24:0] 


25 


o 


Extension Address bus: byte address 


EXrDATA[15:0] 


16 


I/O 


Extension Data bus 


EXnNT[2:0] 


3 


1 


External InterruDt reauests rthrea\ 


EXTACKf201 


3 


o 




CLK40 


^ 


o 


40 5 MH? Clock outout for extension bus and 1 3q4 








Interface 


CS1 


1 


o 


Chip Select 1. Selects EEPROM, 32M byte maximum 








size. 


CS2 




o 


Chip Select 2. Selects extemal DRAM. 


CS3 




o 


Chip Select 3. Selects the modem. 


CS4 


1 . 


o 


Chip Select 4. Selects the front panel 


CSS 


J 


o 


Chip Select 5, Selects front end control. 


CS6 




o 


Chip Select 6. Selects the 1394 interface. 


CS7 


1 


o 


Chip Select 7. Selects the parallel data port 


RAS 




o 


DRAM Row Address Strobe 


UCAS 




o 


DRAM Column address strobe for upper byte 


LCAS 




o 


DRAM Column address strobe for lower byte 


SMIO 




I/O 


Smart Card input/Output 


SMCLK 


1 


O 


Smart Card Output Clock 


SMCLK2 




1 


Smart Card Input Clock, 36,8 MHz . . 


SMDETECT 




1 


Smart Card Detect, active low 


SIVIRST 




O 


Smart Card Reset 


SMVPPEN 




o 


Smart Card Vpp enable 


SMVCCDETECT* 




1 


Smart Card Vcc detect. Signals whether the Smart 








Card Vcc is on. 


SMVCCEN 




o 


Smart Card Vcc enable 


Audio Interface AUD_PLU* 




1 


Input Clock for Audra PLL 


AUD_PLLO 




o 


Control Voltage for external filter of Audk> PLL 


PCM_SRC 




1 


=PCM CkDck Source Select. Indicates whether the 








PCM clock is input to or generated by the 'AV310. 



* incficates a 5 volt tolerant pin 



40 





D^K/ir» ATA* 


1 


o 


PCM Data audio output. 




LRCLK* 


1 


o 


Left/Right Clock for output PCM audio data. 




PCMCLK* 


1 


lorO 


PCM Clock. 


5 




1 


c\ 
\j 


AuHin Sarial Data Clock 




SPDIF* 


1 


r\ 
\J 


QDDIC^ oitHi/^ otrtmit 
Or^Lfii oUOlO ouipul 




Digital Video Interface YCOUT[7:0] 


8 


O 


4:2:2 or 4:4:4 digital video output 


10 


YCCLK 


1 


O 


27 or 40.5 MHz digital video output clock 




YCCTRL[1:0] 


2 


o 


Digital video output control signal 




NTSC/PAL Encoder Interface NTSC/PAL 


1 


1 


NTSC/PAL select. Selects NTSC output wtien high. 










PAL output when low. 


IS 


SYNCSEL 


1 


1 


Sync signal select. When low, selects internal sync 










generation. When high, VSYNC and HSYNC are 










innift^ 




VSYNC ; 


1 


1 orO 


Vertical synchronization signal 


20- 


HSYNC 


1 


1 orO 


Horizontal synchronization signal 




YOUT 


1 


O 


Y signal Output 




BIASY 


1 


1 


Y 0/A Bias-capacitor terminal 


25 




1 
1 


o 


C fiianal Outout 




BIASC 


1 


1 


C D/A Bias-capacitor terminal 




COMPOUT 


1 


o 


Composite signal Output 


30 


BIASCOMP 


1 


1 ' 


Composite Bias-capacitor terminal . 




IREr 


1 


1 
1 






COMP 










Vntr 


1 


1 
1 


VOlla^o 1 Biol t9i lot? 




CriDAM li-i«arfsi/«o QPt ATAII C^'Ol 

oUriMivi intsnacc oumim^io.uj 


1 o 


If 


SDRAM Data bus 




S AUIJH[1 1 .OJ 




r\ 


OUfiAMVI r\uuiocio uuo. 


40 


OD AC 


1 


w 




oOAo 


1 


O' 


SDRAM Column Address Strobe 




bWb 


1 


r\ 


<^DRAM Writfi Pnable 






1 


Ci 
\j 


ortRAM D^ta Ma^k Fnabia Unoer bvte 




SUOML 


1 


r\ 


CORAM Data Ma<k Pnabia 1 ower bvt6 


45 


oCLK 


I 




SDRAM niock 




SCKE 


1 


r\ 

vJ 


Qf^QAM nir%n\c Pnah1« 




SCSI 


T 


\J 


QrtRAM r^hin Qolo^t 1 

oUriMM onip oeieci i 




SCS2 


1 


O 


SDRAM Chip Select 2 


SO 


Device Control: RESET* 


'1 




Reset, active low 


TDI* 


1 




JTAG Data Input. Can be tied high or left floating. 




TCK* 


1 




JTAG Clock. Must be tied low for normal operation. 




IMS* 


1 




JTAG Test Mode Select Can be tied high or left floating. 




TRST* 


1 




JTAG Test Reset, active low. Must be tied low or 


SS 








connected to RESET for normal operations. 


TOO* 


1 


O 


JTAG Data Output 



indicates a 5 volt tolerant pin 
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Table 21 . (continued) 



5 



Reserved 


3 




Reserved for Test 


VCC / GND 


10 




Analog supply 


VCC/GND 


44 




Digital supply 



[0219] Fabrication of data processing device 1000 and 2000 involves multiple steps of implanting various amounts 
of impurities into a semiconductorsubstrate and diffusing the impurities to selected depths within the substrate to form 
TO transistor devices. Masks are formed to control the placement of the impurities. Multiple layers of conductive material 
and tnsulative material are deposited and etched to interconnect the various devices. These steps are performed In a 
clean room environment. 

[0220] A significant portion of the cost of producing the data processing device involves testing. While in wafer form, 
individual devices are biased to an operational state and probe tested for basic operational functionality The wafer is 
IS then separated into individual dice which may be sold as bare die or packaged. After packaging, finished parts are 
biased into an operational state and tested for operational functionality 

[0221] An aftemative embodiment of the novel aspects of the present invention may include other circuitries, which 
are combined with the circuitries disclosed herein in order to reduce the total gate count of the combined functions. 
Since those skilled in the art are aware of techniques for gate minimization, the details of such an embodiment will not 
20 be described herein. 

[0222] As used herein, the terms "applied," "connected," and "connection* mean electrk:ally connected, including 
where additional elements may be in the electrical connection path. 

[0223] While the invention has been described with reference to illustrative embodiments, this description is not 
intended to be construed in a limiting sense. N^rious other embodiments of the inventton will be apparent to persons 
2S skilled in the art upon reference to this description. It is therefore contemplated that the appended claims will cover 
any such modifications of the embodiments as fall within the true scope and spirit of the Invention. 



Claims 

30 

1. A nriethod of decoding video containing predated frames,c9mprising the Steps of: 

(a) decoding a macroblock at either a first resolution or a second resolutbn depending upon assessment of 
said macrobiock. 

3S 

2. The method of claim 1 , wherein: 

(a) said macroblock has an associated motion vector. 



so 
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